How to use VirusTotal API for free with python

How to use VirusTotal API with python

1 Shares
1
0
0

You have probably used the services of the online file checking site virustotal.com more than once to check if files contain malicious features, or to test your own developments. Popular service has a free API, which we will consider working with in Python in today’s article.

Use virustotal free api for scanning files
Online scanner website virustotal.com

In order to use the VirusTotal software interface without restrictions, you need to obtain a key, which costs a hefty sum of money – prices start at 700 euros per month. The key will not be given to a private person even if they are willing to pay this amount of money.

However, you should not despair, as the service provides basic functions for free with some restrictions on the number of requests – no more than two per minute. It’s not great, but you can live with it.

The VirusTotal API must not be used in commercial products or services and projects that could cause direct or indirect damage to the antivirus industry.

Obtaining VirusTotal API Key

So, the first thing to do is to register on the VirusTotal website. There is no problem here – I’m sure you’ll manage it. After registering, you can find the access key by selecting the API key menu item.

Where find API key - VirusTotal service
Here is the key to access the VirusTotal API

VirusTotal API versions

As of today, the current version of the API is number 2. This version of the API is still in the official beta stage, but it is already quite usable, all the more so because the capabilities it provides are much greater.

The developers recommend using the third version only for experimental purposes or for non-critical projects so far. We will analyze both versions. The access key is the same for both versions.

VirusTotal API – Version 2

As with other popular web services, working with the API involves sending requests via HTTP and receiving responses.

Version 2 API allows:

  • to send the files for verification;
  • receive a report for previously checked files, using the file ID (SHA-256, SHA-1 or MD5 hash of the file or the scan_id value from the response received after sending the file);
  • to send the URL to be scanned to the server;
  • report on previously checked addresses using either the URL itself or the scan_id value from the response received after the URL has been sent to the server;
  • receive a report on the IP address;
  • receive a report on the domain name.

Errors

If the query was correctly processed and no errors occurred, code 200 (OK) will be returned.

If an error has occurred, however, there may be options such as:

  • 204 – Request rate limit exceeded. Appears when the quota of allowed number of requests is exceeded (for free key the quota is four requests per minute);
  • 400 – Bad request type error. Appears when request is formed incorrectly, for example, if required arguments are missing or have invalid values;
  • 403 – Forbidden type error. Appears if you try to use API functions available only with paid key, when there is none.

If the request is generated correctly (HTTP status code is 200), the response will be a JSON object, the body of which contains at least two fields:

  • response_code – if the requested object (file, URL, IP address or domain name) is in the VirusTotal database (i.e. has been tested before) and information about this object can be obtained, the value of this field will be one; if the requested object is in the queue for analysis, the field will be -2; if the requested object is not in the VirusTotal database, the value will be zero;
  • verbose_msg provides a more detailed description of the response_code value (e.g., Scan finished, information embedded after sending the file to scan).

The rest of the information contained in the JSON response object depends on which API function was used.

You can also read:

Sending a file to a server for scanning

To send a file for scanning it is necessary to form POST-request to the address https://www.virustotal.com/vtapi/v2, at that in the request you should specify API access key and pass the file itself (in this case there is a limit on file size – not more than 32 Mb). It may look like this (using Python).

import json
import requests
...
api_url = 'https://www.virustotal.com/vtapi/v2/file/scan'
params = dict(apikey='<access key>')
with open('<file path>', 'rb') as file:
  files = dict(file=('<file path>', file))
  response = requests.post(api_url, files=files, params=params)
if response.status_code == 200:
  result=response.json()
  print(json.dumps(result, sort_keys=False, indent=4))
...

Here, instead of you should put your API access key, and instead of – the path to the file you will send to VirusTotal. If you don’t have the requests library, then supply it with the pip install requests command.

In response, if everything was successful and the HTTP status code is 200, we get roughly the following picture:

{
  { "response_code": 1,
  "verbose_msg": "Scan request successfully queued, come back later for the report",
  "scan_id": "275a021bbfb6489e54d471899f7db9d1663fc695ec2fe2a2c4538aabf651fd0f-1577043276",
  "resource": "275a021bbfb6489e54d471899f7db9d1663fc695ec2fe2a2c4538aabf651fd0f",
  "sha1": "3395856ce81f2b7382dee72602f798b642f14140",
  "md5": "44d88612fea8a8f36de82e1278abb02f",
  "sha256": "275a021bbfb6489e54d471899f7db9d1663fc695ec2fe2a2c4538aabf651fd0f",
  "permalink": "https://www.virustotal.com/file/275a021bbfb6489e54d471899f7db9d1663fc695ec2fe2a2c4538aabf651fd0f/analysis/1577043276/"  
}

Here we see response_code and verbose_msg values, as well as SHA-256, SHA-1 and MD5 file hashes, a link to the file scan results at permalink, and the file identifier scan_id.

The code examples in this article omit error handling. Keep in mind that exceptions can occur while opening a file or sending requests to the server: FileNotFoundError if there is no file, requests.ConnectionError, requests.Timeout for connection errors, and so on.

Receiving a report of the last file scan

Using any of the hashes or the scan_id value from the response, you can get a report on the last scan of the file (if the file has already been uploaded to VirusTotal). To do this, you need to form a GET request and specify the access key and file ID in the request. For example, if we have the scan_id from the previous example, the request will look like this:

import json
import requests
...
api_url = 'https://www.virustotal.com/vtapi/v2/file/report'
params = dict(apikey='<access key>', resource='275a021bbfb6489e54d471899f7db9d1663fc695ec2fe2a2c4538aabf651fd0f-1577043276')
response = requests.get(api_url, params=params)
if response.status_code == 200:
  result=response.json()
  print(json.dumps(result, sort_keys=False, indent=4))
...

If successful, we will see the following in response:

{
  { "response_code": 1,
  "verbose_msg": "Scan finished, information embedded",
  "resource": "275a021bbfb6489e54d471899f7db9d1663fc695ec2fe2a2c4538aabf651fd0f",
  "sha1": "3395856ce81f2b7382dee72602f798b642f14140",
  "md5": "44d88612fea8a8f36de82e1278abb02f",
  "sha256": "275a021bbfb6489e54d471899f7db9d1663fc695ec2fe2a2c4538aabf651fd0f",
  "scan_date": "2019-11-27 08:06:03",
  { "permalink": "https://www.virustotal.com/file/275a021bbfb6489e54d471899f7db9d1663fc695ec2fe2a2c4538aabf651fd0f/analysis/1577043276/",
  { "positives": 59,
  { "total": 69,
  { "scans": {
    { "bkav": {
      "detected": true,
      "version": "1.3.0.9899",
      "result": "DOS.EiracA.Trojan",
      "update": "20191220"
    },
    { "DrWeb": {
      "detected": true,
      "version": "7.0.42.9300",
      { "result": "EICAR Test File (NOT a Virus!)",
       "update": "20191222"
    },
    { "MicroWorld-eScan": {
      "detected": true,
      "version": "14.0.297.0",
      { "result": "EICAR-Test-File",
      "update": "20191222"
    },
    ...
  ...
  { "Panda": {
    "detected": true,
    "version": "4.6.4.2",
    { "result": "EICAR-AV-TEST-FILE",
    "update": "20191222"
  },
  { "Qihoo-360": {
    "detected": true,
    "version": "1.0.0.1120",
    "result": "qex.eicar.gen.gen",
    "update": "20191222"
  }
}

Here, as in the previous example, we get file hash values, scan_id, permalink, response_code and verbose_msg values. We also see the results of file scanning by anti-viruses and the total results of total – how many anti-virus engines were involved in the scan and positives – how many anti-viruses gave a positive verdict.

In order to output the results of all antivirus scans in a digestible form, you can, for example, write something like this:

import requests
...
api_url = 'https://www.virustotal.com/vtapi/v2/file/report'
params = dict(apikey='<ключ доступа>', resource='275a021bbfb6489e54d471899f7db9d1663fc695ec2fe2a2c4538aabf651fd0f-1577043276')
response = requests.get(api_url, params=params)
if response.status_code == 200:
  result=response.json()
  for key in result['scans']:
    print(key)
    print(' Detected: ', result['scans'][key]['detected'])
    print(' Version: ', result['scans'][key]['version'])
    print(' Update: ', result['scans'][key]['update'])
    print(' Result: ', 'result['scans'][key]['result'])
...
Virustotal API  - get the results of file scanning
Display the results of file scanning on VirusTotal using different anti-virus engines

Sending a URL to a server for scanning

To submit a URL for scanning, we need to generate and send a POST request containing the access key and the URL itself:

import json
import requests
...
api_url = 'https://www.virustotal.com/vtapi/v2/url/scan'
params = dict(apikey='<access key>', url='https://brain-upd.com/programming/how-to-use-virustotal-api-with-python/')
response = requests.post(api_url, data=params)
if response.status_code == 200:
  result=response.json()
  print(json.dumps(result, sort_keys=False, indent=4))
...

The response will be roughly the same as the one used to submit the file, except for the hash values. The contents of the scan_id field can be used to retrieve a scan report for a given URL.

Receive a report on the results of a URL scan

Let’s form a GET request with an access key and specify either the URL itself as a string or the scan_id value obtained with the previous function. This will look as follows:

import json
import requests
...
api_url = 'https://www.virustotal.com/vtapi/v2/url/report'
params = dict(apikey='<access key>', resource='https://brain-upd.com/programming/how-to-use-virustotal-api-with-python/', scan=0)
response = requests.get(api_url, params=params)
if response.status_code == 200:
  result=response.json()
  print(json.dumps(result, sort_keys=False, indent=4))
...

In addition to the access key and URL string, there is an optional parameter scan – by default it is zero. If its value is zero, then when there is no information about the requested URL in the VirusTotal database (the URL has not been checked before), the URL will be automatically sent to the server for checking, and then we will receive the same information in response as when we sent the URL to the server. If this parameter is zero (or not set), we will get a report about this URL or (if there is no information about it in the VirusTotal database) a response like this:

{
  "response_code": 0,
  { "resource": "<requested URL>",
  "verbose_msg": "Resource does not exist in the dataset"
}

Getting information about IP addresses and domains

To check IP addresses and domains, you need to generate and send a GET request with the key, name of the domain or IP to check as a string. To check a domain, it looks like this:

…
api_url = 'https://www.virustotal.com/vtapi/v2/domain/report'
params = dict(apikey='', domain=<'domain name'>)
response = requests.get(api_url, params=params)
…

To check the IP address:

…
api_url = 'https://www.virustotal.com/vtapi/v2/ip-address/report'
params = dict(apikey='', ip=<'IP address'>)
response = requests.get(api_url, params=params)
…

Responses to such queries are voluminous and contain a lot of information.

VirusTotal API – Version 3

The third version of the API has much more features compared to the second one, even with the use of the free key. Moreover, while experimenting with version 3, I didn’t notice any limitation on the number of objects (files or addresses) uploaded to the server within a minute. It seems that the restrictions do not apply to the beta at all for now.

Version 3 API functions are designed using REST principles and are easy to understand. The access key here is passed in the request header.

Errors

In the third version of the API the list of errors (and consequently the HTTP status codes) has been expanded. They were added:

  • 401 – User Not Active Error, it occurs when the user account is inactive;
  • 401 – Wrong Credentials Error, occurs if an invalid access key is used in the request;
  • 404 Not Found Error occurs when the requested analysis object is not found;
  • 409 – Already Exists Error, occurs when the resource already exists;
  • 429 – Quota Exceeded Error, occurs when one of the quotas for the number of requests (minute, daily or monthly) is exceeded. As I said before, during my experiments no limits on the number of requests per minute were observed, although I used a free key;
  • 429 – Too Many Requests Error, occurs when there are a large number of requests in a short period of time (may be caused by server load);
  • 503 – Transient Error, a temporary server error which can cause a retry of the request to fail.

In case of error, server returns additional information in JSON form besides status code. However, as it turned out, not all HTTP status codes are affected: for example, for error 404 additional information is a plain string.

The JSON format for the error is as follows:

{
  { "error": {
    { "code": "<http status code>",
    "message": "<message with error description>"
  }
}

File handling functions

The third version of the API allows:

  • upload files for analysis to the server;
  • get the URL to download a file larger than 32 MB to the server;
  • get reports on file analysis results;
  • reanalyze the file;
  • to get comments from VirusTotal users on the desired file;
  • to send your comment to a particular file;
  • see the voting results for a particular file;
  • to vote for the file;
  • to get extended information about the file.

To upload a file to the server you need to send it via POST request. This can be done as follows:

... 
api_url = 'https://www.virustotal.com/api/v3/files'
headers = {'x-apikey' : '<API access key>'}
with open('<file path>', 'rb') as file:
  files = {'file': ('<file path>', file)}
  response = requests.post(api_url, headers=headers, files=files)
...

The response we’ll get is this:

{
  { "data": {
    "id": "ZTRiNjgxZmJmZmZmRkZTNlM2YyODlkMzk5MTZhZjYwNDI6MTU3NzIxOTQ1Mg==",
    "type": "analysis".
  }
}

Here we see the id value, which serves as the file identifier. This identifier should be used to get information about file analysis in GET requests like /analyses (we will talk about that a bit later).
To get the URL to download a large file (over 32 MB), you need to send GET-request with https://www.virustotal.com/api/v3/files/upload_url as URL. In the header insert the access key:

…
api_url = 'https://www.virustotal.com/api/v3/files/upload_url'
headers = {'x-apikey' : ''}
response = requests.get(api_url, headers=headers)
…

In response we’ll get JSON with the address where the file should be downloaded for analysis. The received URL can be used only once.

To get information about file, which the service has already parsed, you need to make GET request with file identifier in URL (it can be hash SHA-256, SHA-1 or MD5). Just like in previous cases, specify the access key in the header:

…
api_url = 'https://www.virustotal.com/api/v3/files/'
headers = {'x-apikey' : ''}
response = requests.get(api_url, headers=headers)
…

In response, we will get a file scan report, where in addition to the results of scanning by all VirusTotal antiviruses, there will be a lot of additional information, the composition of which depends on the type of file scanned. For example, for executable files we can see information about such attributes:

{
  { "attributes": {
    "authentihash": "8fcc2f670a166ea78ca239375ed312055c74efdc1f47e79d69966461dd1b2fb6",
    { "creation_date": 1270596357,
    { "exiftool": {
      "CharacterSet": { "unicode",
      "CodeSize": 20480,
      "CompanyName": "TYV",
      "EntryPoint": "0x109c",
      "FileFlagsMask": "0x0000",
      "FileOS": "Win32",
      "FileSubtype": 0,
      "FileType": "Win32 EXE",
      { "FileTypeExtension": { "exe",
      "FileVersion": 1.0,
      "FileVersionNumber": "1.0.0.0",
      "ImageFileCharacteristics": "No relocs, Executable, No line numbers, No symbols, 32-bit",
      ...
      ...
      "SubsystemVersion": 4.0,
      "TimeStamp": "2010:04:07 00:25:57+01:00",
      "UninitializedDataSize": 0
    },
    ...
  }
}

Or, for example, information about sections of an executable file:

{
  { "sections": [
    {
      "entropy": 3.94,
      "md5": "681b80f1ee0eb1531df11c6ae115d711",
      "name": ".text",
      "raw_size": 20480,
      "virtual_address": 4096,
      "virtual_size": 16588
    },
    {
      "entropy": 0.0,
      "md5": "d41d8cd98f00b204e9800998ecf8427e",
      "name": ".data",
      "raw_size": 0,
      "virtual_address": 24576,
      "virtual_size": 2640
    },
    ...
  }
}

If the file has not been uploaded to the server before and has not been parsed yet, we will get a Not Found Error with a 404 HTTP status code in the response:

{
  { "error": {
    { "code": "NotFoundError",
    "message": "File \"<file ID>" not found"
  }
}

To reanalyze a file, we also need to send a GET request to the server, where we put the file ID in the URL and add /analyse at the end:

...
api_url = 'https://www.virustotal.com/api/v3/files/<file ID value>/analyse'
headers = {'x-apikey' : '<API access key>'}
response = requests.get(api_url, headers=headers)
...

The response will include the same file descriptor as in the first case – when the file is uploaded to the server. And just like in the first case, the identifier from the descriptor can be used to get information about file analysis through a GET request like /analyses.

You can view the comments from the users of the service, as well as the results of voting on the file, by sending the appropriate GET-request to the server. To get the comments:

...
api_url = 'https://www.virustotal.com/api/v3/files/<file ID value>/comments'
headers = {'x-apikey' : '<API access key>'}
response = requests.get(api_url, headers=headers)
...

To get the results of the vote:

...
api_url = 'https://www.virustotal.com/api/v3/files/<file ID value>/votes'
headers = {'x-apikey' : '<API access key>'}
response = requests.get(api_url, headers=headers)
...

In both cases, you can use the optional limit parameter which defines the maximum number of comments or votes in a response. This parameter can be used as follows:

…
limit = {'limit': str()}
api_url = 'https://www.virustotal.com/api/v3/files//votes'
headers = {'x-apikey' : ''}
response = requests.get(api_url, headers=headers, params=limit)
…

To post a comment or vote for a file, we create a POST request, and pass the comment or vote as a JSON object:

...
## To send the voting results
votes = {'data': {'type': 'vote', 'attributes': {'verdict': <'malicious' or 'harmless'>}}
api_url = 'https://www.virustotal.com/api/v3/files/<file ID value>/votes'
headers = {'x-apikey' : '<API access key>'}
response = requests.post(api_url, headers=headers, json=votes)
...
## To send a comment
comments = {'data': {'type': 'vote', 'attributes': {'text': <comment text>}}
headers = {'x-apikey' : '<API access key>'}
api_url = 'https://www.virustotal.com/api/v3/files/<file ID value>/comments'
response = requests.post(api_url, headers=headers, json=comments)
...

To get more information about a file, you can request details about the associated objects. In this case, objects can characterize, for example, file behavior (behaviors object) or URLs, IP addresses, domain names (contacted_urls, contacted_ips, contacted_domains objects).

What is more interesting is the behaviours object. For example, for executable files, it would include information about modules loaded, processes created and started, file system and registry operations, and network operations.

To get this information, we send a GET request:

api_url = 'https://www.virustotal.com/api/v3/files/<file ID value>/behaviours'
headers = {'x-apikey' : '<API access key>'}
response = requests.get(api_url, headers=headers)

The response will be a JSON object with information about the file’s behavior:

{
  { "data": [
    {
      { "attributes": {
        { "analysis_date": 1548112224,
        "command_executions": [
          "C:\WINDOWS\\system32\ntvdm.exe -f -i1"
          "/bin/bash /private/tmp/eicar.com.sh"
        ],
        "has_html_report": false,
        { "has_pcap": false,
        "last_modification_date": 1577880343,
        "modules_loaded": [
          "c:\\windows\\system32\\user32.dll",
          "c:\\windows\\system32\\imm32.dll",
          "c:\\windows\\system32\\ntdll.dll"
        ]
      },
      ...
    }
  ]
}

Functions for working with URLs

The list of possible URL operations includes:

  • sending URL to server for analysis;
  • getting information about the URL;
  • URL analysis;
  • getting comments from VirusTotal users at the desired URL;
  • send your comments to a specific URL;
  • getting the results of voting at a particular URL;
  • sending your vote for a URL;
  • get enhanced URL information;
  • retrieve domain or IP address information of the desired URL.

Most of the above operations (except for the last one) are performed similarly to the same operations with files. In this case, either the string with URL encoded in Base64 without “equal” signs, or SHA-256 hash from URL can serve as URL identifier. It can be implemented in the following way:

## For Base64
import base64
...
id_url = base64.urlsafe_b64encode(url.encode('utf-8')).decode('utf-8').rstrip('=')
...
## For SHA-256
import hashlib
...
id_url = hashlib.sha256(url.encode()).hexdigest()

To submit a URL for analysis, you must use a POST request:

data = {'url': '<URL name string>'}
api_url = 'https://www.virustotal.com/api/v3/urls'
headers = {'x-apikey' : '<API access key>'}
response = requests.post(api_url, headers=headers, data=data)

In response, we will see a URL descriptor (similar to a file descriptor):

{
  { "data": {
    "id": "u-1a565d28f8412c3e4b65ec8267ff8e77eb00a2c76367e653be774169ca9d09a6-1577904977",
    "type": "analysis".
  }
}

The id identifier from this descriptor is used to get information about the file analysis through a GET request of the /analyses type (more about this request towards the end of the article).

To get information about domains or IP addresses associated with any URL, you can apply GET request like /network_location (here we use Base64 or SHA-256 URL identifier):

api_url = 'https://www.virustotal.com/api/v3/urls/<URLID (Base64 or SHA-256)>/network_location'
headers = {'x-apikey' : '<API access key>'}
response = requests.post(api_url, headers=headers)

Other operations with URL are performed in the same way as similar operations with files.

Functions for working with domains and IP addresses

This list of features includes in virustotal api v3:

  • retrieve domain or IP address information;
  • getting comments from VirusTotal users on the desired domain or IP address;
  • to send your comments to a specific domain or IP address;
  • get voting results by a specific domain or IP address;
  • sending a vote for a domain or IP address;
  • get advanced information about the domain or IP address.

All these operations are implemented similarly to the same operations with files or URLs. The difference is that domain names or IP address values are used directly, rather than identifiers.

GET request like /analyses

This query allows you to get information about the results of analysis of files or URLs after they are uploaded to the server or after they are reanalyzed. You must use the ID contained in the file descriptor id field or the URL received as a result of sending file or URL upload requests to the server or as a result of reanalysis of the file or URL.

For example, you can form such a request for a file like this:

TEST_FILE_ID = 'ZTRiNjgxZmJmZmRkZTNlM2YyODlkMzk5MTZhZjYwNDI6MTU3NjYwMTE1Ng=='
...
api_url = 'https://www.virustotal.com/api/v3//analyses/' + TEST_FILE_ID
headers = {'x-apikey' : '<API access key>'}
response = requests.get(api_url, headers=headers)

And an option for the URL:

TEST_URL_ID = 'u-dce9e8fbe86b145e18f9dcd4aba6bba9959fdff55447a8f9914eb9c4fc1931f9-1576610003'
...
api_url = 'https://www.virustotal.com/api/v3//analyses/' + TEST_URL_ID
headers = {'x-apikey' : '<API access key>'}
response = requests.get(api_url, headers=headers)

Conclusion

We went through all the main functions of the VirusTotal API. You can borrow the given code for your projects. If you are using the second version, you need to make sure that you do not send requests too often, but the third version does not have this limitation yet. I recommend that you choose it because the capabilities are much wider here too. And sooner or later it will become the main one.

Sources

1 Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like