File¶
This section presents the detailed schema of the File object model within the Assemblyline application. Each entry in the schema represents a field that constitutes a File document within the File index. The information provided for each field includes the data type, a concise description, its requirement status, and the default value if any.
Understanding this schema is crucial for constructing effective and precise Lucene search queries. By leveraging the fields outlined in the table below, you can craft queries to retrieve specific information about files analyzed by Assemblyline. These fields are integral for in-depth data analysis, enabling you to filter and locate files based on various attributes such as type, hash values, classification, and many others.
Utilize this schema as a reference to enhance your search capabilities within the Assemblyline system, allowing for more targeted and refined data retrieval that aligns with your cybersecurity analysis needs.
| Field | Type | Description | Required | Default |
|---|---|---|---|---|
| archive_ts | Date | Timestamp indicating when the file was archived. | Optional |
None |
| ascii | Keyword | Provides a dotted ASCII representation of the first 64 bytes of the file. | Yes |
None |
| classification | Classification | Security classification assigned to the file based on its contents and context. | Yes |
None |
| comments | List [Comment] | User comments linked to the file. | Yes |
[] |
| entropy | Float | Entropy value indicating randomness or potential obfuscation. | Yes |
None |
| expiry_ts | Date | Timestamp indicating when the file is scheduled to expire from the system. | Optional |
None |
| is_section_image | Boolean | Indicates if the file is an image safe for web browser display, often part of analysis results. | Yes |
False |
| is_supplementary | Boolean | Indicates if the file was created by an AssemblyLine service as supplementary data. | Yes |
False |
| hex | Keyword | Hexadecimal representation of the first 64 bytes of the file. | Yes |
None |
| labels | List [Keyword] | Array of descriptive labels applied to the file for categorization and analysis. | Yes |
[] |
| label_categories | LabelCategories | Structured categories for the labels applied to the file. | Yes |
See LabelCategories for more details. |
| md5 | MD5 | The MD5 hash of the file, used for identifying duplicates and verifying integrity. | Yes |
None |
| magic | Keyword | File type info derived from libmagic. | Yes |
None |
| mime | Keyword | MIME type of the file from libmagic. | Optional |
None |
| seen | Seen | Records the frequency and timestamps of when the file was encountered. | Yes |
See Seen for more details. |
| sha1 | SHA1 | The SHA1 hash of the file, providing a more secure alternative to MD5 for integrity checks. | Yes |
None |
| sha256 | SHA256 | The SHA256 hash of the file, offering a high level of security for integrity verification. | Yes |
None |
| size | Long | Size of the file in bytes. | Yes |
None |
| ssdeep | SSDeepHash | The fuzzy hash of the file using SSDEEP, which is useful for identifying similar files. | Yes |
None |
| type | Keyword | The file type as determined by the AssemblyLine file type identification service. | Yes |
None |
| tlsh | Keyword | A locality-sensitive hash (TLSH) of the file's content, useful for similarity comparisons. | Optional |
None |
| from_archive | Boolean | Indicates whether the file was retrieved from Assemblyline's archive during processing. | Yes |
False |
| uri_info | URIInfo | Detailed components of the file's URI for advanced search functionality. | Optional |
None |
Comment¶
Model that represents user annotations attached to a file.
A Comment is a user-generated note or observation that can be added to a file within Assemblyline. This feature enables analysts to record insights, share findings, and collaborate on the analysis of a file. Each comment is timestamped and associated with the username of the individual who authored it, creating an audit trail of analytical discourse.
| Field | Type | Description | Required | Default |
|---|---|---|---|---|
| cid | UUID | Unique identifier for the comment. | Yes |
None |
| uname | Keyword | The username of the individual who authored the comment. | Yes |
None |
| date | Date | The date and time when the comment was posted. | Yes |
NOW |
| text | Text | The content of the comment as written by the user. | Yes |
None |
| reactions | List [Reaction] | An array of user reactions to the comment, such as likes or dislikes. | Yes |
[] |
Reaction¶
Model that encapsulates user interactions with a comment.
The Reaction model captures the responses of users to comments made on a file. Reactions are simple expressions of agreement, disagreement, or sentiment, represented by a set of predefined icons. These reactions facilitate a quick, non-verbal form of feedback from users, enhancing collaborative analysis and engagement within the Assemblyline platform.
| Field | Type | Description | Required | Default |
|---|---|---|---|---|
| icon | Enum | Icon name representing the type of reaction given to a comment. Supported values are: "love", "party", "smile", "surprised", "thumbs_down", "thumbs_up" |
Yes |
None |
| uname | Keyword | The username of the individual who reacted to the comment. | Yes |
None |
LabelCategories¶
Structured categorization model for labels applied to a file.
LabelCategories provide a systematic approach to classifying the characteristics and threat indicators of a file. This model organizes labels into distinct categories such as informational tags, technical techniques, and attribution links. By categorizing labels, analysts can efficiently navigate and assess the nature and potential threats associated with a file, streamlining the malware analysis process.
| Field | Type | Description | Required | Default |
|---|---|---|---|---|
| info | List [Keyword] | Informational labels providing context. | Yes |
[] |
| technique | List [Keyword] | Labels identifying techniques or triggered detections (e.g., MITRE ATT&CK® TTPs). | Yes |
[] |
| attribution | List [Keyword] | Labels related to threat actors or campaigns. | Yes |
[] |
Seen¶
Tracking model for the occurrence and frequency of a file within the system.
The Seen model is designed to record and quantify the instances in which a file is encountered by Assemblyline. It keeps a count of the file's occurrences and logs the timestamps of the first and most recent sightings. This temporal information is crucial for understanding the prevalence and distribution of a file over time, aiding in threat trend analysis and situational awareness.
| Field | Type | Description | Required | Default |
|---|---|---|---|---|
| count | Integer | Number of times the file has been observed. | Yes |
1 |
| first | Date | The timestamp of the file's first sighting. | Yes |
NOW |
| last | Date | The timestamp of the file's most recent sighting. | Yes |
NOW |
URIInfo¶
Detailed breakdown model of a file's Uniform Resource Identifier (URI).
URIInfo dissects a file's URI into its fundamental components, providing granular data for advanced search and identification. This includes the scheme, network location, path, and other elements such as query parameters and fragments. By parsing these components, Assemblyline allows for a more nuanced analysis of the source and context of a file, which is essential for forensic investigations and threat intelligence gathering.
Each of these descriptions aims to provide a clearer understanding of the purpose and utility of the respective models within Assemblyline, highlighting their roles in the broader context of malware analysis and cyber security operations.
| Field | Type | Description | Required | Default |
|---|---|---|---|---|
| uri | Keyword | Full URI of the file. | Yes |
None |
| scheme | Keyword | URI scheme (e.g., http, ftp). | Yes |
None |
| netloc | Keyword | Network location including domain and port. | Yes |
None |
| path | Keyword | Path within the host. | Optional |
None |
| params | Keyword | The parameters component of the URI, often used for session management. | Optional |
None |
| query | Keyword | The query string of the URI, containing data for server-side processing. | Optional |
None |
| fragment | Keyword | The fragment identifier of the URI, used to navigate to a specific part of the resource. | Optional |
None |
| username | Keyword | Username in the URI, if present. | Optional |
None |
| password | Keyword | Password in the URI, if present. | Optional |
None |
| hostname | Keyword | Hostname extracted from the URI. | Yes |
None |
| port | Integer | Port number in the URI. | Optional |
None |