Saturday, May 23, 2020
Division Divided by Division
The debate that rages on about the COVID-19 quarantine suffers from the classic complexity of trade-offs that are non-resolvable. Shutting down saved lives but it also destroyed [socioeconomic] lives. Opening up too soon or too quickly will cost lives, but remaining closed will destroy more lives. Saving lives means delaying herd immunity, but achieving herd immunity means losing lives. The list of trade-offs goes on, and the real socioeconomic costs of quarantine measures can only be justified by estimated benefits of undertaking them, making such measures difficult to defend in concrete terms. In these kinds of situations, no amount of data or science can build adequate consensus, and this is especially true when the data comes into question (e.g. the variance in death rate based on the true infection rate, which we do not know) and the science comes into question (e.g. the general population being unable to grasp that the very basis of the scientific method). The only thing that can build consensus in these types of situations is trust in government, as in: "I know that they do not have all the answers, I know that they might get it wrong from time to time, but I also know that they are taking advantage of the best data and science that the country has to offer and are making the most informed decision they can on my behalf." I need not say why we do not have this level of trust in government today, but I do need to say that the breakdown along party lines is a sad state of affairs. If we had stronger leadership, we would be more unified, especially in troubled times. I think we can all agree on that, but maybe not.
Wednesday, May 20, 2020
Salesforce Einstein Analytics Dataflow to Deduplicate a Dataset
Let's say you have the following data:
Key,Value,Timestamp
K1,V1,2020-05-01
K1,V2,2020-05-02
K1,V3,2020-05-03
K1,V4,2020-05-04
K2,V1,2020-05-04
K2,V2,2020-05-04
K2,V3,2020-05-02
K2,V4,2020-05-01
K3,V4,2020-05-01
K3,V3,2020-05-02
K3,V2,2020-05-03
K3,V1,2020-05-04
K4,V4,2020-05-04
K4,V3,2020-05-03
K4,V2,2020-05-02
K4,V1,2020-05-01
K5,V1,2020-05-01
K5,V2,2020-05-01
K5,V3,2020-05-01
K5,V4,2020-05-01
Further, let's say that you want to remove duplicate rows (based on the Key) and keep the latest row (based on the Timestamp), with the additional complexity of there being duplicate Timestamps.
Your target result would consist of the following, reduced data:
Key,Value,Timestamp
K1,V4,2020-05-04
K2,V2,2020-05-04
K3,V1,2020-05-04
K4,V4,2020-05-04
K5,V4,2020-05-01
To achieve this result, you can generalize the following approach, in reference to the sample dataflow included in its entirety at the bottom of this post.
Dataflow:
{
"getDataset": {
"action": "edgemart",
"parameters": {
"alias": "KeyValueTimestamp"
}
},
"computePartitionCounter": {
"action": "computeRelative",
"parameters": {
"source": "getDataset",
"computedFields": [
{
"name": "partitionCounter",
"label": "partitionCounter",
"expression": {
"saqlExpression": "previous(partitionCounter) + 1",
"type": "Numeric",
"scale": 2,
"default": "1"
}
}
],
"orderBy": [
{
"name": "Timestamp",
"direction": "asc"
}
],
"partitionBy": [
"Key"
]
}
},
"computeKeepRow": {
"action": "computeRelative",
"parameters": {
"source": "computePartitionCounter",
"computedFields": [
{
"name": "keepRow",
"label": "keepRow",
"expression": {
"saqlExpression": "case when current(partitionCounter) == first(partitionCounter) then \"true\" else \"false\" end",
"type": "Text"
}
}
],
"orderBy": [
{
"name": "partitionCounter",
"direction": "desc"
}
],
"partitionBy": [
"Key"
]
}
},
"filterDataset": {
"action": "filter",
"parameters": {
"source": "computeKeepRow",
"saqlFilter": "keepRow == \"true\""
}
},
"sliceDataset": {
"action": "sliceDataset",
"parameters": {
"mode": "drop",
"fields": [
{
"name": "keepRow"
},
{
"name": "partitionCounter"
}
],
"source": "filterDataset"
}
},
"registerDataset": {
"action": "sfdcRegister",
"parameters": {
"alias": "KeyValueTimestampDeduplicated",
"name": "KeyValueTimestampDeduplicated",
"source": "sliceDataset"
}
}
}
Key,Value,Timestamp
K1,V1,2020-05-01
K1,V2,2020-05-02
K1,V3,2020-05-03
K1,V4,2020-05-04
K2,V1,2020-05-04
K2,V2,2020-05-04
K2,V3,2020-05-02
K2,V4,2020-05-01
K3,V4,2020-05-01
K3,V3,2020-05-02
K3,V2,2020-05-03
K3,V1,2020-05-04
K4,V4,2020-05-04
K4,V3,2020-05-03
K4,V2,2020-05-02
K4,V1,2020-05-01
K5,V1,2020-05-01
K5,V2,2020-05-01
K5,V3,2020-05-01
K5,V4,2020-05-01
Further, let's say that you want to remove duplicate rows (based on the Key) and keep the latest row (based on the Timestamp), with the additional complexity of there being duplicate Timestamps.
Your target result would consist of the following, reduced data:
Key,Value,Timestamp
K1,V4,2020-05-04
K2,V2,2020-05-04
K3,V1,2020-05-04
K4,V4,2020-05-04
K5,V4,2020-05-01
To achieve this result, you can generalize the following approach, in reference to the sample dataflow included in its entirety at the bottom of this post.
- Get the dataset ;-)
- computeRelative (computePartitionCounter)
- Partition by the Key
- Order by the Timestamp (ascending)
- Compute a partitionCounter column (number, default = 1)
- saqlExpression: previous(partitionCounter) + 1
- computeRelative (computeKeepRow)
- Partition by the Key
- Order by the partitionCounter (descending)
- Note that weather you sort by descending timestamp and then ascending partitionCounter, or vice versa (as I have done here) is immaterial
- Compute a keepRow column (text)
- saqlExpresion: case when current(partitionCounter) == first(partitionCounter) then "true" else "false" end
- Note that in the case of K2 and K5, since some or all of the rows have the same Timestamp, keepRow will be "true" for one of them, arbitrarily, more than likely based on any internal ordering with Einstein Analytics.
- Filter by keepRow == "true"
- Slice off parititionCounter and keepRow
- Register the de-duplicated dataset
In my example, I went from this...
...to this...
Cheers!
Dataflow:
{
"getDataset": {
"action": "edgemart",
"parameters": {
"alias": "KeyValueTimestamp"
}
},
"computePartitionCounter": {
"action": "computeRelative",
"parameters": {
"source": "getDataset",
"computedFields": [
{
"name": "partitionCounter",
"label": "partitionCounter",
"expression": {
"saqlExpression": "previous(partitionCounter) + 1",
"type": "Numeric",
"scale": 2,
"default": "1"
}
}
],
"orderBy": [
{
"name": "Timestamp",
"direction": "asc"
}
],
"partitionBy": [
"Key"
]
}
},
"computeKeepRow": {
"action": "computeRelative",
"parameters": {
"source": "computePartitionCounter",
"computedFields": [
{
"name": "keepRow",
"label": "keepRow",
"expression": {
"saqlExpression": "case when current(partitionCounter) == first(partitionCounter) then \"true\" else \"false\" end",
"type": "Text"
}
}
],
"orderBy": [
{
"name": "partitionCounter",
"direction": "desc"
}
],
"partitionBy": [
"Key"
]
}
},
"filterDataset": {
"action": "filter",
"parameters": {
"source": "computeKeepRow",
"saqlFilter": "keepRow == \"true\""
}
},
"sliceDataset": {
"action": "sliceDataset",
"parameters": {
"mode": "drop",
"fields": [
{
"name": "keepRow"
},
{
"name": "partitionCounter"
}
],
"source": "filterDataset"
}
},
"registerDataset": {
"action": "sfdcRegister",
"parameters": {
"alias": "KeyValueTimestampDeduplicated",
"name": "KeyValueTimestampDeduplicated",
"source": "sliceDataset"
}
}
}
Tuesday, May 5, 2020
Conscientious Courtesy
Friday, May 1, 2020
Salesforce OAuth 2.0 Web Server Flow Using cURL
If you are in need of validating a Salesforce API from start to finish using cURL - i.e. from authorization code to access token to a successful API call using the access token - try this:
Create a Connected App in your Salesforce Org:
Check the Salesforce documentation for Create a Connected App with OAuth. Include https://localhost as one of the Callback URLs to follow along with this example, or use whatever URL you like, but just be sure to modify the example below accordingly.
Once you have the Connected App, you can copy-paste the following URL in browser:
Note: If you are attempting to do this in Production, then use 'login' instead of 'test' in the URL.
https://test.salesforce.com/services/oauth2/authorize?client_id=[Consumer Key from Connected App]&response_type=code&scope=api&access_type=offline&redirect_uri=https://localhost
You will be presented with a login screen.
After successfully logging in, your browser will be redirected to https://localhost, which, obviously, will not render a working page, but the authorization code will be available in the URL window:
Copy the value of the code (i.e. code=[authorization_code]) into a text editor, and then URL decode any characters that are URL encoded. In my experience, this is typically two %3D characters at the end of the code, which equates to two equal signs, but I make no claim as to the standard.
Use the Authorization Code to get the Access Token via cURL
Note: If you are attempting to do this in Production, then use 'login' instead of 'test' in the URL.
curl --request POST 'https://test.salesforce.com/services/oauth2/token' --data 'client_id=[Consumer Key from Connected App]' --data 'client_secret=[Consumer Secret from Connected App]' --data 'code=[Authorization Code from Previous Call]' --data 'redirect_uri=https://localhost' --data 'grant_type=authorization_code'
This will return the following response (in non-pretty print format):
{"access_token":"[access_token]","signature":"[signature_redacted]","scope":"api","instance_url":"https://[instance].my.salesforce.com","id":"https://test.salesforce.com/id/[id_redacted]/[id_redacted]","token_type":"Bearer","issued_at":"1588342208615"}
Use the access_token to make API calls:
Note: The following example is highly specific (e.g. POST, json, etc), so adjust accordingly.
curl --request POST '[REST API URI]' --header 'Authorization: Bearer [access_token]' --header 'Content-Type: application/json' --data '{"key_1" : "value_1","key_2" : "value_2"}'
Special Note: The examples worked using Git Bash on Windows, and may need to be modified to work in the Windows Command Window (i.e. you may need to enclose arguments in double quotes and escape double quotes in JSON, etc). Just beware of this nuance.
Create a Connected App in your Salesforce Org:
Check the Salesforce documentation for Create a Connected App with OAuth. Include https://localhost as one of the Callback URLs to follow along with this example, or use whatever URL you like, but just be sure to modify the example below accordingly.
Once you have the Connected App, you can copy-paste the following URL in browser:
Note: If you are attempting to do this in Production, then use 'login' instead of 'test' in the URL.
https://test.salesforce.com/services/oauth2/authorize?client_id=[Consumer Key from Connected App]&response_type=code&scope=api&access_type=offline&redirect_uri=https://localhost
You will be presented with a login screen.
After successfully logging in, your browser will be redirected to https://localhost, which, obviously, will not render a working page, but the authorization code will be available in the URL window:
Copy the value of the code (i.e. code=[authorization_code]) into a text editor, and then URL decode any characters that are URL encoded. In my experience, this is typically two %3D characters at the end of the code, which equates to two equal signs, but I make no claim as to the standard.
Use the Authorization Code to get the Access Token via cURL
Note: If you are attempting to do this in Production, then use 'login' instead of 'test' in the URL.
curl --request POST 'https://test.salesforce.com/services/oauth2/token' --data 'client_id=[Consumer Key from Connected App]' --data 'client_secret=[Consumer Secret from Connected App]' --data 'code=[Authorization Code from Previous Call]' --data 'redirect_uri=https://localhost' --data 'grant_type=authorization_code'
This will return the following response (in non-pretty print format):
{"access_token":"[access_token]","signature":"[signature_redacted]","scope":"api","instance_url":"https://[instance].my.salesforce.com","id":"https://test.salesforce.com/id/[id_redacted]/[id_redacted]","token_type":"Bearer","issued_at":"1588342208615"}
Use the access_token to make API calls:
Note: The following example is highly specific (e.g. POST, json, etc), so adjust accordingly.
curl --request POST '[REST API URI]' --header 'Authorization: Bearer [access_token]' --header 'Content-Type: application/json' --data '{"key_1" : "value_1","key_2" : "value_2"}'
Special Note: The examples worked using Git Bash on Windows, and may need to be modified to work in the Windows Command Window (i.e. you may need to enclose arguments in double quotes and escape double quotes in JSON, etc). Just beware of this nuance.
Subscribe to:
Posts (Atom)