Resources¶
Learn about the storage space and computational resources included with your account.
It is a requirement defined by the Science Requirements Document that the “Data Management System will also provide at least 10% of its total capability for user-dedicated processing and user-dedicated storage” (Section 3.5).
This page quantifies the individual and shared resources that every account at the US Data Access Center (US DAC) has access to, by default, via the Rubin Science Platform at https://data.lsst.cloud/.
Important
The US DAC, the Rubin Science Platform, and this documentation are currently under construction. The estimates quoted below include current, planned, and to-be-determined (TBD) values, all of which are subject to change as hardware, software, and user habits evolve.
Page last updated: Tue Feb 18 2025
Number of users¶
Design specification: 10000 individual user accounts.
Number of simultaneous users above which service may degrade:
Notebook Aspect (JupyterLab servers): 517 a
Portal Aspect sessions: TBD
API connections: TBD
Maximum number of simultaneous users (hard limit):
Notebook Aspect (JupyterLab servers): TBD
Portal Aspect sessions: TBD
API connections: TBD
Maximum number of services accessed simultaneously per user:
Notebook Aspect (JupyterLab servers): 1 b
Portal Aspect sessions: 1 b
API connections: TBD
Notebook sessions will be automatically shut-down after 5 days of inactivity, or after 25 days.
a This is the number of science platform cores for users, from row one of Table 37 in the DM Sizing Model. Note that the RSP was designed to include 517 cores for users, and to expand to accommodate more simultaneous users. Table 43 shows this increasing to 4664 by LSST year 10.
b But, users can have multiple browser tabs open to the same session.
Computational processing¶
Computational resources are available via the Notebook Aspect (JupyterLab).
Notebook server options:
Small (1.0 CPU, 4 GB RAM)
Medium (2.0 CPU, 8 GB RAM)
Large (4.0 CPU, 16 GB RAM)
Only CPUs (central processing units) are available. No GPU (graphical processing units) are available and there is no plan to add them.
Cores per science user: 0.1 c
c The number of cores per science user is from Table 37 in the DM Sizing Model. Table 43 shows this increasing to 0.6 by LSST year 10. It is \(<1\) because it includes oversubscription and assumes not all users are simultaneously connected.
Batch processing¶
This refers to parallel processing (asynchronous) job submission. The user batch facility is focused on supporting a large variety of smaller needs for the broader community (DMTN-202). User batch processing will be available by Data Release 1 (DR1). Access to user batch processing will be allocated by the Resource Allocation Committee.
Number of core hours total per year: 4.53E+06 d
Number of core hours per user available via RAC: TBD; to be set by DR1
d This preliminary estimate is 10% of the total number of core-hours needed for Data Release Processing as quoted in Table 27, Section 6.1 of the DM Sizing Model, and is number is subject to change.
Storage¶
During the Early Science era (the Data Previews and Data Release 1), the total amount of shared user space across all user directories (home, project, and scratch) is 24 TB.
During the Operations era, the anticipated amount of individual home directory disk space (in the cloud) is 0.4 TB e.
The total shared disk space for batch users approved by the RAC remains TBD.
e This preliminary estimate comes from the “Storage per science user” row of Table 31, Section 7.2 of the DM Sizing Model. Table 39 shows this increasing to 1.3 TB by LSST year 10.
Backups¶
Users are encouraged to use services such as GitHub for software version control and to take care not to accidentally delete files from their home directory.
While there is no guarantee that accidentally deleted data can be recovered, users are encouraged to use the resources for Getting help immediately if mistakes do happen.
Query and memory limits¶
All queries are executed with shared resources. The length of time to query completion depends firstly on query design (number of shards accessed), and secondly on number of queries across all users. There is no limit on the number of queries a user can do in total (or on daily or yearly timescales), but there are query rate limits.
The size of a dataset retrieved by a query and held in memory depends on the server size which, for the Notebook Aspect, is selected by the user.
Number of TAP queries per user per 15 minutes: 500 f
Reset interval after user excession: 15 minutes g
f A nominal quota configuration in the RSP quotas and rate limiting document. g Also from the RSP quotas and rate limiting document.
Portal Aspect TAP (Table Access Protocol) service:
Maximum rows returned: 5,000,000
Maximum table size returned: TBD
Portal Aspect ObsTAP (TAP access to images):
Maximum rows of image metadata: 5,000,000
Notebook Aspect TAP (Table Access Protocol) service:
Maximum rows returned: 5,000,000
Maximum table size returned: RAM limit of the user-selected server size
Notebook Aspect Butler service:
Maximum number of simultaneous butler queries per user: TBD (2-5)
Maximum number of references returned: no limit
Maximum data volume returned: RAM limit of the user-selected server size
API Aspect TAP (Table Access Protocol) service:
Maximum rows returned: 5,000,000
Maximum table size returned: TBD
Download and upload limits¶
These estimates remain largely to-be-determined (TBD) and might depend sensitively on user load. The Data Previews will be used to quantify and optimize user experience with respect to data transfers.
The amount of data a user may download or upload, and the data transfer rates, depend also on the user’s internet service provider.
Minimum data transfer rates:
Download: TBD
Upload: TBD
Maximum data volumes:
Download table size: 6 GB
Download image(s) size: TBD
Download daily total: TBD
Upload table size: TBD
Resource Allocation Committee (RAC)¶
Individuals and groups in need of more than the standard resources, and/or who require batch processing via the RSP deployed at the US DAC (data.lsst.cloud), will submit proposals to the Resource Allocation Committee (RAC).
The quantities of the resources that the RAC will allocate, and the process by which the RAC will operate, are currently under development (see RTN-084).
Independent Data Access Centers (IDACs)¶
Individuals and groups in need of more than the standard or batch resources available via the US DAC, and/or who need, e.g., GPUs, specialized software, non-Rubin data sets, should consider using one of the Independent Data Access Centers (IDACs). Some IDACs might contribute their resources for allocation by the RAC.
More information about IDACs is in development.