msg403776 - (view) |
Author: Dong-hee Na (corona10) * |
Date: 2021-10-13 01:45 |
From gdbm 1.21, gdbm supports the crash tolerance feature.
We may need to provide APIs for those versions.
https://www.gnu.org.ua/software/gdbm/manual/Crash-Tolerance.html
Following features will be provided if the user using gdbm >= 1.21
- Need to provide `GDBM_NUMSYNC` as `s`.
- Need to provide API for gdbm_failure_atomic()
- Need to provide API for gdbm_latest_snapshot()
|
msg403777 - (view) |
Author: Dong-hee Na (corona10) * |
Date: 2021-10-13 01:45 |
I am going to work on this issue :)
|
msg403780 - (view) |
Author: Dong-hee Na (corona10) * |
Date: 2021-10-13 01:54 |
FYI, I got a mail about this feature from Terence Kelly who design these amazing things :)
|
msg403809 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2021-10-13 08:47 |
See also issue22035.
|
msg403895 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2021-10-14 11:35 |
How would it be used from Python? What are scenarios?
|
msg403904 - (view) |
Author: Dong-hee Na (corona10) * |
Date: 2021-10-14 14:43 |
> How would it be used from Python? What are scenarios?
I am preparing PoC, I will share you once done.
From gdbm 1.21, gdbm provides 2 kinds of format. (Standard format / Extended format.)
To create a gdm format with the extended option, GDBM_NUMSYNC flag needs to be supported.
Without this, there is no way to create a database file with extension option from Python module.
I am thinking about the following usage.
import dbm.gnu as dbm
db = dbm.open('x.db', 'nx')
|
msg403907 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2021-10-14 15:35 |
And what's next?
|
msg403908 - (view) |
Author: Dong-hee Na (corona10) * |
Date: 2021-10-14 15:43 |
> And what's next?
I may introduce following API which only available to use extension format.
(I am PoC with this)
> dbm.gdbm_failure_atomic('snapshot_path0', 'snapshot_path1')
About gdbm_latest_snapshot(), I am still curious that Python module should provide this API so I am still communicating with original authors about this.
|
msg403909 - (view) |
Author: Dong-hee Na (corona10) * |
Date: 2021-10-14 15:44 |
Sorry, I don't know my answer is enough.
Or do you have any ideas or the opposite ideas?
|
msg403910 - (view) |
Author: Dong-hee Na (corona10) * |
Date: 2021-10-14 15:46 |
dbm.gdbm_failure_atomic('snapshot_path0', 'snapshot_path1')
-> db.gdbm_failure_atomic('snapshot_path0', 'snapshot_path1')
|
msg403911 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2021-10-14 15:52 |
I am interesting how these gdbm_failure_atomic() and gdbm_latest_snapshot() can be used in user code. Some real world examples. They look very low-level and requiring an additional boilerplate code to be useful if I understand it correctly.
|
msg403912 - (view) |
Author: Dong-hee Na (corona10) * |
Date: 2021-10-14 15:57 |
Yeah, I agree with gdbm_latest_snapshot(), this API might be too low-level.
But as I am one of the rocksdb user who directly uses rocksdb API for the production platform as caching purposes.
so when I think about my and my team's rocksdb usage, gdbm_failure_atomic() itself can be useful if the user needs snapshots if they need to recover for the special situation.
|
msg403913 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2021-10-14 16:12 |
Please show examples.
|
msg403916 - (view) |
Author: Dong-hee Na (corona10) * |
Date: 2021-10-14 16:34 |
> Please show examples.
Sorry, I don't understand, do you mean caching example or recovery example?
When we use rocksdb, we can get checkpoints files under local file system.
And we can recover the database by using them.
http://rocksdb.org/blog/2015/11/10/use-checkpoints-for-efficient-snapshots.html
If we provide gdbm for similar things, users can recover the database when the accident happens, I believe that it sometimes happens.(Power out, hardware fault etc..)
The recovering tool itself does not need to be Python, but python client needs to have a option to save snapshot, but currently not.
Please read how to recover the database if they have snapshots for gdbm.
https://www.gnu.org.ua/software/gdbm/manual/Crash-recovery.html
|
msg403924 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2021-10-14 17:51 |
Examples of using the new feature.
|
msg404065 - (view) |
Author: Dong-hee Na (corona10) * |
Date: 2021-10-16 09:47 |
I've done my PoC in my local environment.
```
import dbm.gnu as dbm
db = dbm.open('x.db', 'nx')
db.gdbm_failure_atomic('even_snapshot.bin', 'odd_snapshot.bin')
for k, v in zip('abcdef', 'ghijkl'):
db[k] = v
db.sync()
db.close()
```
By doing this in local fs both snapshot files are created and it can be used for recovery x.db file.
gdbmtool> snapshot even_snapshot.bin odd_snapshot.bin
GDBM_SNAPSHOT_OK: Selected the most recent snapshot.
odd_snapshot.bin: 400 r-------- 1634377177.462498326 6
you can use odd_snapshot.bin as the last successful snapshot file.
>>> import dbm.gnu as dbm
>>> db = dbm.open('odd_snapshot.bin', 'r')
>>> db.keys()
[b'c', b'f', b'a', b'd', b'b', b'e']
|
msg404066 - (view) |
Author: Dong-hee Na (corona10) * |
Date: 2021-10-16 09:51 |
So IMHO, those APIs are not that low-level API since they only need to create a file with 'x' flag and then calling gdbm_failure_atomic API.
and if the user failed to save the file due to several accidents.
They can easily restore the local file DB by using the latest valid snapshot.
|
msg404189 - (view) |
Author: Dong-hee Na (corona10) * |
Date: 2021-10-18 14:49 |
@serhiy
what is the main concern about this feature?
|
msg404190 - (view) |
Author: Dong-hee Na (corona10) * |
Date: 2021-10-18 14:50 |
if you think that this feature is not usable, I will let gdbms maintainers write the idea to the Python-dev mailing list if they want to enable this feature.
|
msg404200 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2021-10-18 17:07 |
The main concern is that it is not clear how to use this feature, and if it is not clear, it will not be used. I am not even sure that it is Pythonic, because I do not know how to use it. For example, can it be used to implement transactions? How it works with multithreading and multiprocessing if works at all? Does it restore after failure automatically or needs some user's action? And how do user can know that some actions are required?
|
msg404262 - (view) |
Author: Dong-hee Na (corona10) * |
Date: 2021-10-19 02:14 |
Wow, long discussion than I expected, I wish that you don't feel uncomfortable with my opinion first :)
> The main concern is that it is not clear how to use this feature, and if it is not clear
IMHO, this feature is similar usage level with gdbm.reorganize() API for end-user.
https://docs.python.org/3/library/dbm.html?highlight=gdbm#dbm.gnu.gdbm.reorganize
I already show you how end-user will use this API in msg404065.
So I don't want to explain the usage again.
> For example, can it be used to implement transactions?
AFAIk this feature is only used for left snapshot files storing
if the user wants to recover when the user needed.(for example, disk is too old so it can cause the system is down, or any disaster situation, or unexpected system fault)
And snapshot files can be stored anywhere(separated secondary disk, remote-mounted disk.)
So if you can ask snapshot is important? From my side *yes*, it can guarantee that we can recover the file when we want to.
IMHO using this API is up to the end user's purpose.
> I am not even sure that it is Pythonic,
Hmm, you mean API signature? Python has a long tradition of being a thin wrapper to C functions. (gdbm.reorganize() is a good example)
Since gdbm module is the most accessible python client that today Python users can use, I think we have to provide this feature since gdbm authors write this feature for end-user usage.
if not authors may not expose those APIs through `gdbmtool`.
FYI, gdbmtool is a CLI tool that you can execute basic gdbms operations.
If you installed gdbm 1.21 on your local machine, you can use crash tolerance features simply though gdbmtool
The essential of this feature looks simple.
* If you want to left snapshot files for gdbm, please create the gdbm file extension format(X flag) and then execute gdbm_failure_atomic.
If you don't feel the same way, I would like to suggest sending a mailing list and I may requests this to gdbms author since they request this issue to me through a mail and they are also more expert about gdbm more than me.
Thanks for reading.
|
msg410840 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2022-01-18 02:56 |
The gdbm module's purpose is effectively one of just exposing the underlying C library APIs to Python as you said.
Consider this a +1 in favor of exposing the new APIs in the Python gdbm module.
I'm not concerned about anyone wanting these in older Python versions. It really requires a combination of modern software in order to use. Running on a recent kernel version, using a non-default fancy filesystem, and linking against a recent gdbm library. So being a new feature in 3.11 makes sense. Anyone satisfying all of the above is highly likely to already also use a recent CPython.
If anyone _really_ wants it for older things, they can maintain backport on PyPI.
|
msg411093 - (view) |
Author: Dong-hee Na (corona10) * |
Date: 2022-01-21 08:12 |
After discussion with Victor by using DM, I decided to provide high-level API instead of low-level APIs.
- gdbm.open(filename, snapshots=(foo, bar))
will do everything at once.
Regards,
Dong-hee
|
|
Date |
User |
Action |
Args |
2022-04-11 14:59:51 | admin | set | github: 89615 |
2022-01-21 08:13:06 | corona10 | set | nosy:
+ vstinner
|
2022-01-21 08:12:36 | corona10 | set | messages:
+ msg411093 |
2022-01-18 02:56:24 | gregory.p.smith | set | nosy:
+ gregory.p.smith messages:
+ msg410840
|
2021-10-19 02:14:27 | corona10 | set | messages:
+ msg404262 |
2021-10-18 17:07:59 | serhiy.storchaka | set | messages:
+ msg404200 |
2021-10-18 14:50:52 | corona10 | set | messages:
+ msg404190 |
2021-10-18 14:49:09 | corona10 | set | messages:
+ msg404189 |
2021-10-16 09:51:50 | corona10 | set | messages:
+ msg404066 |
2021-10-16 09:47:18 | corona10 | set | messages:
+ msg404065 |
2021-10-14 19:46:01 | vstinner | set | nosy:
- vstinner
|
2021-10-14 17:51:12 | serhiy.storchaka | set | messages:
+ msg403924 |
2021-10-14 16:34:32 | corona10 | set | messages:
+ msg403916 |
2021-10-14 16:12:47 | serhiy.storchaka | set | messages:
+ msg403913 |
2021-10-14 15:57:26 | corona10 | set | messages:
+ msg403912 |
2021-10-14 15:52:25 | serhiy.storchaka | set | messages:
+ msg403911 |
2021-10-14 15:46:42 | corona10 | set | messages:
+ msg403910 |
2021-10-14 15:44:59 | corona10 | set | messages:
+ msg403909 |
2021-10-14 15:43:53 | corona10 | set | messages:
+ msg403908 |
2021-10-14 15:35:51 | serhiy.storchaka | set | messages:
+ msg403907 |
2021-10-14 14:43:48 | corona10 | set | messages:
+ msg403904 |
2021-10-14 11:35:05 | serhiy.storchaka | set | messages:
+ msg403895 |
2021-10-14 09:34:50 | corona10 | set | keywords:
+ patch stage: patch review pull_requests:
+ pull_request27230 |
2021-10-13 08:47:12 | serhiy.storchaka | set | messages:
+ msg403809 |
2021-10-13 01:54:31 | corona10 | set | messages:
+ msg403780 |
2021-10-13 01:52:32 | corona10 | set | title: Support crash tolerance for gdbm module -> Support crash tolerance feature for gdbm module |
2021-10-13 01:52:14 | corona10 | set | nosy:
+ serhiy.storchaka
|
2021-10-13 01:45:43 | corona10 | set | messages:
+ msg403777 |
2021-10-13 01:45:22 | corona10 | create | |