File¶
This section presents the detailed schema of the File object model within the Assemblyline application. Each entry in the schema represents a field that constitutes a File document within the File index. The information provided for each field includes the data type, a concise description, its requirement status, and the default value if any.
Understanding this schema is crucial for constructing effective and precise Lucene search queries. By leveraging the fields outlined in the table below, you can craft queries to retrieve specific information about files analyzed by Assemblyline. These fields are integral for in-depth data analysis, enabling you to filter and locate files based on various attributes such as type, hash values, classification, and many others.
Utilize this schema as a reference to enhance your search capabilities within the Assemblyline system, allowing for more targeted and refined data retrieval that aligns with your cybersecurity analysis needs.
Field | Type | Description | Required | Default |
---|---|---|---|---|
archive_ts | Date | Timestamp indicating when the file was archived. | Optional |
None |
ascii | Keyword | Provides a dotted ASCII representation of the first 64 bytes of the file. | Yes |
None |
classification | Classification | Security classification assigned to the file based on its contents and context. | Yes |
None |
comments | List [Comment] | An array of user-generated comments pertaining to the file. See Comment model for more information. | Yes |
[] |
entropy | Float | A numerical value representing the file's entropy, which is defined as the level of randomness in the file's content, typically used to detect compression or encryption. High entropy may indicate obfuscation techniques such as encryption, commonly employed by malware to evade detection. This metric is not exclusive to malicious files, as legitimate files can also exhibit high entropy. | Yes |
None |
expiry_ts | Date | Timestamp indicating when the file is scheduled to expire from the system. | Optional |
None |
is_section_image | Boolean | Indicates if the file is an image safe for web browser display, often part of analysis results. | Yes |
False |
is_supplementary | Boolean | Indicates if the file was created by an AssemblyLine service as supplementary data. | Yes |
False |
hex | Keyword | Hexadecimal representation of the first 64 bytes of the file. | Yes |
None |
labels | List [Keyword] | Array of descriptive labels applied to the file for categorization and analysis. | Yes |
[] |
label_categories | LabelCategories | Structured categories for the labels applied to the file. | Yes |
See LabelCategories for more details. |
md5 | MD5 | The MD5 hash of the file, used for identifying duplicates and verifying integrity. | Yes |
None |
magic | Keyword | Detailed file format information derived from an analysis using the libmagic library, including text descriptions of the file's content type and encoding. | Yes |
None |
mime | Keyword | The Multipurpose Internet Mail Extensions (MIME) type of the file as determined by libmagic, which identifies file types by checking their headers according to a predefined list of file types. | Optional |
None |
seen | Seen | Records the frequency and timestamps of when the file was encountered. | Yes |
See Seen for more details. |
sha1 | SHA1 | The SHA1 hash of the file, providing a more secure alternative to MD5 for integrity checks. | Yes |
None |
sha256 | SHA256 | The SHA256 hash of the file, offering a high level of security for integrity verification. | Yes |
None |
size | Integer | Size of the file in bytes. | Yes |
None |
ssdeep | SSDeepHash | The fuzzy hash of the file using SSDEEP, which is useful for identifying similar files. | Yes |
None |
type | Keyword | The file type as determined by the AssemblyLine file type identification service. | Yes |
None |
tlsh | Keyword | A locality-sensitive hash (TLSH) of the file's content, useful for similarity comparisons. | Optional |
None |
from_archive | Boolean | Indicates whether the file was retrieved from Assemblyline's archive during processing. | Yes |
False |
uri_info | URIInfo | Detailed components of the file's URI for advanced search functionality. | Optional |
None |
Comment¶
Model that represents user annotations attached to a file.
A Comment is a user-generated note or observation that can be added to a file within Assemblyline. This feature enables analysts to record insights, share findings, and collaborate on the analysis of a file. Each comment is timestamped and associated with the username of the individual who authored it, creating an audit trail of analytical discourse.
Field | Type | Description | Required | Default |
---|---|---|---|---|
cid | UUID | Unique identifier for the comment. | Yes |
None |
uname | Keyword | The username of the individual who authored the comment. | Yes |
None |
date | Date | The date and time when the comment was posted. | Yes |
NOW |
text | Text | The content of the comment as written by the user. | Yes |
None |
reactions | List [Reaction] | An array of user reactions to the comment, such as likes or dislikes. | Yes |
[] |
Reaction¶
Model that encapsulates user interactions with a comment.
The Reaction model captures the responses of users to comments made on a file. Reactions are simple expressions of agreement, disagreement, or sentiment, represented by a set of predefined icons. These reactions facilitate a quick, non-verbal form of feedback from users, enhancing collaborative analysis and engagement within the Assemblyline platform.
Field | Type | Description | Required | Default |
---|---|---|---|---|
icon | Enum | Icon name representing the type of reaction given to a comment. Supported values are: "love", "party", "smile", "surprised", "thumbs_down", "thumbs_up" |
Yes |
None |
uname | Keyword | The username of the individual who reacted to the comment. | Yes |
None |
LabelCategories¶
Structured categorization model for labels applied to a file.
LabelCategories provide a systematic approach to classifying the characteristics and threat indicators of a file. This model organizes labels into distinct categories such as informational tags, technical techniques, and attribution links. By categorizing labels, analysts can efficiently navigate and assess the nature and potential threats associated with a file, streamlining the malware analysis process.
Field | Type | Description | Required | Default |
---|---|---|---|---|
info | List [Keyword] | Informational labels providing additional context about the file. | Yes |
[] |
technique | List [Keyword] | An array of labels identifying the specific tactics, techniques, and procedures (TTPs) as defined by the MITRE ATT&CK® framework that are exhibited by the malware within the file. This field also includes labels for any detection signatures that triggered during analysis, providing insight into the malware's behavior and potential impact. Analysts can use these labels to correlate files with known adversary behavior and to enhance threat hunting and incident response activities. | Yes |
[] |
attribution | List [Keyword] | Labels that relate to the attribution of the file, such as the associated threat actor or campaign. | Yes |
[] |
Seen¶
Tracking model for the occurrence and frequency of a file within the system.
The Seen model is designed to record and quantify the instances in which a file is encountered by Assemblyline. It keeps a count of the file's occurrences and logs the timestamps of the first and most recent sightings. This temporal information is crucial for understanding the prevalence and distribution of a file over time, aiding in threat trend analysis and situational awareness.
Field | Type | Description | Required | Default |
---|---|---|---|---|
count | Integer | The total number of times the file has been observed by the system. | Yes |
1 |
first | Date | The timestamp of the file's first sighting. | Yes |
NOW |
last | Date | The timestamp of the file's most recent sighting. | Yes |
NOW |
URIInfo¶
Detailed breakdown model of a file's Uniform Resource Identifier (URI).
URIInfo dissects a file's URI into its fundamental components, providing granular data for advanced search and identification. This includes the scheme, network location, path, and other elements such as query parameters and fragments. By parsing these components, Assemblyline allows for a more nuanced analysis of the source and context of a file, which is essential for forensic investigations and threat intelligence gathering.
Each of these descriptions aims to provide a clearer understanding of the purpose and utility of the respective models within Assemblyline, highlighting their roles in the broader context of malware analysis and cyber security operations.
Field | Type | Description | Required | Default |
---|---|---|---|---|
uri | Keyword | The complete Uniform Resource Identifier (URI) of the file. | Yes |
None |
scheme | Keyword | The scheme component of the URI (e.g., "http", "ftp"). | Yes |
None |
netloc | Keyword | The network location part of the URI, including the domain name and port. | Yes |
None |
path | Keyword | The path component of the URI, specifying the resource within the host. | Optional |
None |
params | Keyword | The parameters component of the URI, often used for session management. | Optional |
None |
query | Keyword | The query string of the URI, containing data for server-side processing. | Optional |
None |
fragment | Keyword | The fragment identifier of the URI, used to navigate to a specific part of the resource. | Optional |
None |
username | Keyword | The username specified in the URI, if any. | Optional |
None |
password | Keyword | The password specified in the URI, if any. | Optional |
None |
hostname | Keyword | The hostname extracted from the netloc, representing the domain of the URI. | Yes |
None |
port | Integer | The port number extracted from the netloc, representing the communication endpoint. | Optional |
None |