Skip to content

File

This section presents the detailed schema of the File object model within the Assemblyline application. Each entry in the schema represents a field that constitutes a File document within the File index. The information provided for each field includes the data type, a concise description, its requirement status, and the default value if any.

Understanding this schema is crucial for constructing effective and precise Lucene search queries. By leveraging the fields outlined in the table below, you can craft queries to retrieve specific information about files analyzed by Assemblyline. These fields are integral for in-depth data analysis, enabling you to filter and locate files based on various attributes such as type, hash values, classification, and many others.

Utilize this schema as a reference to enhance your search capabilities within the Assemblyline system, allowing for more targeted and refined data retrieval that aligns with your cybersecurity analysis needs.

Field Type Description Required Default
archive_ts Date Timestamp indicating when the file was archived.
Optional
None
ascii Keyword Provides a dotted ASCII representation of the first 64 bytes of the file.
Yes
None
classification Classification Security classification assigned to the file based on its contents and context.
Yes
None
comments List [Comment] An array of user-generated comments pertaining to the file. See Comment model for more information.
Yes
[]
entropy Float A numerical value representing the file's entropy, which is defined as the level of randomness in the file's content, typically used to detect compression or encryption. High entropy may indicate obfuscation techniques such as encryption, commonly employed by malware to evade detection. This metric is not exclusive to malicious files, as legitimate files can also exhibit high entropy.
Yes
None
expiry_ts Date Timestamp indicating when the file is scheduled to expire from the system.
Optional
None
is_section_image Boolean Indicates if the file is an image safe for web browser display, often part of analysis results.
Yes
False
is_supplementary Boolean Indicates if the file was created by an AssemblyLine service as supplementary data.
Yes
False
hex Keyword Hexadecimal representation of the first 64 bytes of the file.
Yes
None
labels List [Keyword] Array of descriptive labels applied to the file for categorization and analysis.
Yes
[]
label_categories LabelCategories Structured categories for the labels applied to the file.
Yes
See LabelCategories for more details.
md5 MD5 The MD5 hash of the file, used for identifying duplicates and verifying integrity.
Yes
None
magic Keyword Detailed file format information derived from an analysis using the libmagic library, including text descriptions of the file's content type and encoding.
Yes
None
mime Keyword The Multipurpose Internet Mail Extensions (MIME) type of the file as determined by libmagic, which identifies file types by checking their headers according to a predefined list of file types.
Optional
None
seen Seen Records the frequency and timestamps of when the file was encountered.
Yes
See Seen for more details.
sha1 SHA1 The SHA1 hash of the file, providing a more secure alternative to MD5 for integrity checks.
Yes
None
sha256 SHA256 The SHA256 hash of the file, offering a high level of security for integrity verification.
Yes
None
size Integer Size of the file in bytes.
Yes
None
ssdeep SSDeepHash The fuzzy hash of the file using SSDEEP, which is useful for identifying similar files.
Yes
None
type Keyword The file type as determined by the AssemblyLine file type identification service.
Yes
None
tlsh Keyword A locality-sensitive hash (TLSH) of the file's content, useful for similarity comparisons.
Optional
None
from_archive Boolean Indicates whether the file was retrieved from Assemblyline's archive during processing.
Yes
False
uri_info URIInfo Detailed components of the file's URI for advanced search functionality.
Optional
None

Comment

Model that represents user annotations attached to a file.

A Comment is a user-generated note or observation that can be added to a file within Assemblyline. This feature enables analysts to record insights, share findings, and collaborate on the analysis of a file. Each comment is timestamped and associated with the username of the individual who authored it, creating an audit trail of analytical discourse.

Field Type Description Required Default
cid UUID Unique identifier for the comment.
Yes
None
uname Keyword The username of the individual who authored the comment.
Yes
None
date Date The date and time when the comment was posted.
Yes
NOW
text Text The content of the comment as written by the user.
Yes
None
reactions List [Reaction] An array of user reactions to the comment, such as likes or dislikes.
Yes
[]

Reaction

Model that encapsulates user interactions with a comment.

The Reaction model captures the responses of users to comments made on a file. Reactions are simple expressions of agreement, disagreement, or sentiment, represented by a set of predefined icons. These reactions facilitate a quick, non-verbal form of feedback from users, enhancing collaborative analysis and engagement within the Assemblyline platform.

Field Type Description Required Default
icon Enum Icon name representing the type of reaction given to a comment.
Supported values are:
"love", "party", "smile", "surprised", "thumbs_down", "thumbs_up"
Yes
None
uname Keyword The username of the individual who reacted to the comment.
Yes
None

LabelCategories

Structured categorization model for labels applied to a file.

LabelCategories provide a systematic approach to classifying the characteristics and threat indicators of a file. This model organizes labels into distinct categories such as informational tags, technical techniques, and attribution links. By categorizing labels, analysts can efficiently navigate and assess the nature and potential threats associated with a file, streamlining the malware analysis process.

Field Type Description Required Default
info List [Keyword] Informational labels providing additional context about the file.
Yes
[]
technique List [Keyword] An array of labels identifying the specific tactics, techniques, and procedures (TTPs) as defined by the MITRE ATT&CK® framework that are exhibited by the malware within the file. This field also includes labels for any detection signatures that triggered during analysis, providing insight into the malware's behavior and potential impact. Analysts can use these labels to correlate files with known adversary behavior and to enhance threat hunting and incident response activities.
Yes
[]
attribution List [Keyword] Labels that relate to the attribution of the file, such as the associated threat actor or campaign.
Yes
[]

Seen

Tracking model for the occurrence and frequency of a file within the system.

The Seen model is designed to record and quantify the instances in which a file is encountered by Assemblyline. It keeps a count of the file's occurrences and logs the timestamps of the first and most recent sightings. This temporal information is crucial for understanding the prevalence and distribution of a file over time, aiding in threat trend analysis and situational awareness.

Field Type Description Required Default
count Integer The total number of times the file has been observed by the system.
Yes
1
first Date The timestamp of the file's first sighting.
Yes
NOW
last Date The timestamp of the file's most recent sighting.
Yes
NOW

URIInfo

Detailed breakdown model of a file's Uniform Resource Identifier (URI).

URIInfo dissects a file's URI into its fundamental components, providing granular data for advanced search and identification. This includes the scheme, network location, path, and other elements such as query parameters and fragments. By parsing these components, Assemblyline allows for a more nuanced analysis of the source and context of a file, which is essential for forensic investigations and threat intelligence gathering.

Each of these descriptions aims to provide a clearer understanding of the purpose and utility of the respective models within Assemblyline, highlighting their roles in the broader context of malware analysis and cyber security operations.

Field Type Description Required Default
uri Keyword The complete Uniform Resource Identifier (URI) of the file.
Yes
None
scheme Keyword The scheme component of the URI (e.g., "http", "ftp").
Yes
None
netloc Keyword The network location part of the URI, including the domain name and port.
Yes
None
path Keyword The path component of the URI, specifying the resource within the host.
Optional
None
params Keyword The parameters component of the URI, often used for session management.
Optional
None
query Keyword The query string of the URI, containing data for server-side processing.
Optional
None
fragment Keyword The fragment identifier of the URI, used to navigate to a specific part of the resource.
Optional
None
username Keyword The username specified in the URI, if any.
Optional
None
password Keyword The password specified in the URI, if any.
Optional
None
hostname Keyword The hostname extracted from the netloc, representing the domain of the URI.
Yes
None
port Integer The port number extracted from the netloc, representing the communication endpoint.
Optional
None