Choosing your ingestion method¶
While integrating Assemblyline with other systems, the first thing you will need to do is to pick an ingestion method.
Assemblyline gives you two options:
- Asynchronous (Using the Ingest API: /api/v4/ingest/)
- Synchronous (Using the Submit API: /api/v4/submit/)
We will give you here a rundown of the different particularities of each method so you can pick the one that fits your needs the best.
Asynchronous ingestion¶
This is the preferred ingestion method for use with Assemblyline. In this mode, Assemblyline will queue your submission based on priority and will process them when the services have empty processing cycles. For each submission in this mode, you will get assigned an ingestion ID and you can be notified via a completion queue when your file has completed scanning. Alternatively, you can use the alerting page in the Assemblyline UI if you want to only view asynchronous submissions that Assemblyline deems highly suspicious.
The asynchronous model was built to sustain a large sample set of files and to help analysts focus on what is important.
Benefits and Drawbacks¶
Benefits
- Support large volume of files for processing
- Not subjected to quota limits
- Allows for alerting functionality to be used
- Will resort to data sampling if it gets overwhelmed with too many files
- Does submission-level caching if the same file is submitted twice with the same parameters, for performance optimization
Drawbacks
- Submissions may sit in the queue a long time if the system is busy
- Submissions may be skipped if the system is overwhelmed
- Metadata is not searchable for all submissions since the system does not create a submission entry for cache submissions
Typical use cases¶
Here are the typical use cases that users encounter while using the asynchronous submission mode in the system.
Using the Ingest API while reading a message from the notification queue
- The user submits all its files and receives ingestion IDs for its files
- API: /api/v4/ingest/
- The user asks the notification for messages until it receives a confirmation message for all its files
- API: /api/v4/ingest/get_message_list/
Using the Ingest API ignoring the notification queue but using the alert perspective
- The user submits all its files and ignores the returned ingestion IDs
- API: /api/v4/ingest/
- The user then monitors the UI alerting perspective for newly created alerts
- UI: /alerts
Synchronous ingestion¶
In this mode, Assemblyline will start the scanning of your file right away and will return you the ID of your submission. You will be able to use this ID to ask the system if the submission is complete and to pull the results when all the services are done reporting results for that submission.
This is more suited for a small volume of files and manual analysis. Files submitted via the User interface are using the synchronous mode.
Benefits and Drawbacks¶
Benefits
- Instant scanning
- Higher priority than asynchronous
- Submission guaranteed to be processed (no data sampling)
- Metadata searchable for all submissions
Drawbacks
- Subjected to quota (Default: 5 concurrent submissions)
- Not suited for large volume of files
- No submission-level caching
- Alerting not available
Typical use cases¶
Here are the typical use cases that user's encounter while using the synchronous submission mode in the system.
Using the Submit API waiting for the submission to be done
- The user sends its file for processing and receives an ID for its submission
- API: /api/v4/submit/
- The user queries the
is completed
API until the system says the submission is completed- API: /api/v4/submission/is_completed/
/
- API: /api/v4/submission/is_completed/
- The user pulls the results for the submission
- API: /api/v4/submission/full/
/
- API: /api/v4/submission/full/
Mass Submission Toolkit¶
The Assemblyline Incident Manager can assist you with submitting a large amount of files, such as every file on a hard drive for example. It utilizes the ingest API that we just talked about.
The Cyber Centre uses this tool when we are in a pinch and need to ingest millions of files to Assemblyline without learning how to use the APIs.
One key consideration for submitting a large volume of files in a burst is the default sampling values
in the Ingester Configuration.
You must keep your ingestion flow at a rate such that the size of the priority ingestion queue remains lower than the corresponding priority queue sampling_at
values, otherwise, Assemblyline will skip files.