Cegal tech blog

Is it possible for AI to detect SQL injections using free models?

Written by Daniel Andersson | Jan 26, 2024 7:33:14 AM

With the advancement in AI and availability of free models an idea was born to investigate if current free AI models can be used to identify SQL injections. The article describes the idea, process to validate and the outcome.

Information security is vital for our digital world. Thus, we need to find innovative solutions to identify threats early. Working with audit logs, we are able to see exactly what actions are being taken in our IT systems, with the caveat that the amount of information is massive. With the emerging technology in the AI field, new possibilities arise to analyze large amounts of data and where a more human-like decision can be reached.

A limitation regarding analytics of audit data, especially SQL audit logs, is that sensitive data may be present in the log data, and thus, it may not be possible or recommended to send this data to external services, such as cloud-based AI services. As such, this article focuses only on models running on-premises, and, on a cost basis only free models.

It shall also be noted that there are no model that is specifically designed and trained to detect SQL injections, as such this is an experiment to see what a more generic large language model (LLM) are capable of in regards of analyzing SQL audit logs.

Technical implementation

The hardware used to run the experiments is a generic workstation with an NVIDIA RTX 4070Ti-based GPU. A Ubuntu Linux installation is used to run the models using “ollama” software package.

A vulnerable website with MySQL database was deployed using docker container technology. Configuration of MySQL to enable audit logs was performed using the following SQL commands.

SET global log_output = 'FILE';

SET global general_log_file='/tmp/sql.log'

SET global general_log = 1;

In a production setup, the log file should not be put in /tmp as it could lead to unauthorized access to sensitive information. 

SQL Injection

The software SQLMap was used to perform the SQL Injection attempts towards the vulnerable web application, running it in as extensive a mode as possible to generate a large set of logs.

A real-world example may be less noisy and less obvious.

SQL Audit logs

The SQL audit logs show a clear pattern of strange behavior, a few audit records would easily raise concerns if read by a human with knowledge of SQL and SQL injections. A sample of records:

Invoking AI model

With the Ollama software package, free LLMs are easy to fetch and run; after installation of Ollama, the following command is used to start the model with an interactive CLI prompt as well as enable the API.

ollama run <model-name>

The first model to be tested is the more generic Mistral model, which is trained on generic data scraped from the Internet. Hence, it is not specialized in any way for SQL and/or security.

The command used to invoke the model is:

ollama run mistral

Once the model is downloaded and started, an AI input prompt is given to the user for interactive access. Using the log data and composing the following prompt gives a clear answer.

Prompt:

I have audit logs enabled in MySQL and three log entries are showing the following that I see as odd, can you tell me if this is a SQL injection? The log data is:

select username, salary from users where username = 'test AND EXTRACTVALUE(1910,CONCAT(0x5c,0x716b6a6b71,(SELECT (ELT(1910=1910,1))),0x717a716271))-- HSJZ' and password = 'pwd'

select username, salary from users where username = 'test' AND 5839=CAST((CHR(113)||CHR(107)||CHR(106)||CHR(107)||CHR(113))||(SELECT (CASEWHEN (5839=5839) THEN 1 ELSE 0 END))::text||(CHR(113)||CHR(122)||CHR(113)||CHR(98)||CHR(113)) AS NUMERIC) AND 'WuLa'='WuLa' and password = 'pwd'

select username, salary from users where username = 'test';SELECT PG_SLEEP(5)--' and password = 'pwd'

Answer:

 

To add some challenges to detecting the SQL injection, another log entry is chosen where it is less obvious that there is an attempt to perform the injection. The log entry:

select username, salary from users where username = 'test AND 3790=3790-- Vsnp' and password = 'pwd'

The prompt is updated.

Prompt:

I have audit logs enabled in MySQL, and one log entry shows the following that I see as odd, can you tell me if this is a SQL injection? The log data is:

select username, salary from users where username = 'test AND 3790=3790-- Vsnp' and password = 'pwd'

Answer:

The language model are still able to identify that it is an SQL injection, it also reference back to the previous question as it is in the same dialogue.

A last test is performed with an SQL statement that is a valid request from the application. This to identify the possibility of a false positive response, the log entry:

select username, salary from users where username = 'test' and password = 'pwd'

The prompt is updated.

Prompt:

I have audit logs enabled in MySQL, and one log entry shows the following which I see as odd. Can you tell me if this is a SQL injection? The log data is:

select username, salary from users where username = 'test' and password = 'pwd'

Answer:

Testing only with this limited set of entries from logs, it appears that a generic language model is aware of what could be a potential SQL injection without being specifically trained to detect security threats. Most probably, there would be both false positives as well as missed detections if the dataset used to test were more extensive.

Also the answer in its current form is hard to use in any automation, as such we would need to tweak the prompt to ensure that we have some form of binary variable that is either true or false.

Prompt:

I have audit logs enabled in MySQL, and one log entry shows the following, which I see as odd. Can you tell me if this is a SQL injection? Use a variable named injection and set it to true if a SQL injection is detected, otherwise, set it to false, for example your answer “injection=false”. The log data is:

select username, salary from users where username = 'test' and password = 'pwd' 

For a known injection, the prompt would be:

I have audit logs enabled in MySQL and one log entry are showing the following that I see as odd, can you tell me if this is a SQL injection? Use a variable named injection and set it to true if a SQL injection is detected. Otherwise, set it to false, for example, your answer “injection=false.” The log data is:

select username, salary from users where username = 'test AND 3790=3790-- Vsnp' and password = 'pwd'

Answers:

Ollama API request

By default, Ollama exposes the API services on TCP port 11343. To create a sample request using Curl, an input file with the required JSON is created using one of the above samples:

A request is sent using the command:

curl http://localhost:11434/api/generate -d "@input.txt"

The API response is in JSON format, with each generated word in its own response line. The snippet shows the last few rows:

 

This approach would make automation possible, where the SQL query is read from the audit log and passed into the AI language model, which responds with both sentences as well as a true or false variable.

Conclusion

The idea of having AI to analyze the SQL audit logs seems to be fully implementable, though using a generic language model may not be the most effective and accurate way. As such, this should only be an indication of what actually could be possible with AI models trained to analyze large datasets of logs and detect security threats.

Taking this concept further, there could also be additional improvements. For example, instead of looking at only the SQL audit logs, the implementation could combine the HTTP request data from the client with the SQL Audit logs and the code snippet for the actual SQL request in the prompt to give more context to the AI engine that is to determine if the request is a threat.

Future improvements in models and implementation could allow this to be made in real-time, thus rejecting requests that are a threat.

It also important to understand that many of the common SQL injections are found by other non AI tools, as such it must be evaluated where AI may improve in detection compared to tools that are already available.

 I hope this was useful! 💚