For those who directly arrived to this post, I'd strongly suggest reading the following in sequence to gain context

A data breach is the intentional or inadvertent exposure of confidential information to unauthorized parties. In the digital era, data has become one of the most critical components of an enterprise. Data leakage poses serious threats to organizations, including significant reputation damage and financial losses. As the volume of data is growing exponentially and data breaches are happening more frequently than ever before, detecting and preventing data loss has become one of the most pressing security concerns for enterprises.

There are multiple points and opportunities for an enterprise to deploy effective protections to secure sensitive data against inadvertent or malicious leak threats that may appear during data storage, usage, or movement.

Upon the Code Property Graph we use NLP and ML to identify variable names that are sensitive and thereafter categorize/classify them. BTW: you can add and rank your own sensitive terms and models to the baseline dictionary

Execute the following commands in Ocular shell.

**//Ensure that your environment is setup and initialized prior to executing any commands below (**

**//List all sensitive User and System defined types in your application**

Let’s pick one such sensitive data variable and track if it’s being leaked to a log file without redaction.
Let us further examine one such movement of sensitive data from application logic to log file without obfuscation, encryption or redaction.

**//Let's pick the `User` type and investigate if it's being leaked in clear text to the log file**
val source = cpg.local.evalType(".*User.*").referencingIdentifiers
val sink = cpg.method.fullName(globals.javaLogger).parameter

**//Extract the pretty printed flow from above, convert to JSON and persist to file system or create an issue in GitHub using the integration API**
val dataLeakFlow = sink.reachableBy(source).flows.passesNot("(obfuscate|redact)").passes(globals.javaLogger)

val flowtrace = flows.getFlowTrace(dataLeakFlow)

**//write json to file**
val pw = new PrintWriter(new File("/tmp/dataleak.json"))

**//make sure to set your GitHub API prior to using this**
github.createIssueInGitHub(flowtrace, globals.accessToken, globals.owner, globals.reponame, "Sensitive PII data is being logged to channel without being redacted/ofbuscated - complaince violation")

Given that we discovered earlier that the engineer in this code base had hard coded credentials, lets follow these credentials to verify if its being leaked to the log file as well

val source = cpg.method.literal.code(globals.awsSecret) 
val sink = cpg.method.fullName(globals.javaLogger).parameter

Image Courtesy : Luther Bottril

When an exception is thrown due to some error detected during the program execution, the try-catch block will catch it. However, exception is re-thrown causing the error up in the call chain, thus eventually being displayed to the end user.

The problem with this undisciplined exception propagation is the nature and amount of information being displayed to the user. This problem can be seen where  error message contains the web server’s name, the server version, the database’s name, the full stack trace and more information that end users should not know. These information items are always the first ones an attacker tries to obtain. The knowledge of the server’s name, the server version and which programming language the website is using allows the attacker to search for known vulnerabilities and exploit them.

Example screen shot

This vulnerability is hard to be detected by manual inspection in programs without a well-structured exception handling policy, which is a very common scenario. In these programs, it becomes very hard for the code reviewer to inspect all classes and execution paths, in order to know what exceptions are being thrown, which ones are being properly handled and, finally, which ones are not. In medium and large-sized projects, this can be very complicated and time consuming.