2022-07-27  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.50.2.2

2022-07-27  Eitan Raviv  <eraviv@redhat.com>

	Revert "network: use nmstate for bridge options"
	This reverts commit 35e919fd23023f64bb4c59de9631363b675f165b.
	The revert is temporary until a fix for [1] is merged.
	[1] https://bugzilla.redhat.com/2108974

	Revert "network: nmstate bridge options - test w/ multicast-router"
	This reverts commit 494e4c4604b7fee80ddbffb56de559a1d3aa71f6.
	The revert is temporary until a fix for [1] is created.
	[1] https://bugzilla.redhat.com/2108974

2022-07-26  Albert Esteve  <aesteve@redhat.com>

	multipath: make isconfigured fail if ufn enabled
	Vdsm does not support 'user_friendly_names' in the
	multipath configuration, which is currently configured
	accordingly by the host. However, this configuration
	can be overriden, potentially causing lvm corruption,
	and Vdsm does not check for this configuration at all.

	This could be checked in 'vdsm-tool isconfigured' call
	to prevent Vdsm to start after an upgrade if the
	configuration is not correct.

	If the attribute is enabled, fail in the isconfigured
	call from the tool.multipath module and print the
	offending section and attributes. This will prevent
	Vdsm to start and will point the user to the right
	fix. Example:

	$ vdsm-tool is-configured --module multipath
	WARNING: Invalid configuration: 'user_friendly_names' is
	enabled in multipath configuration:
	  section1 {
	    key1 value1
	    user_friendly_names yes
	    key2 value2
	  }
	  section2 {
	    user_friendly_names yes
	  }
	This configuration is not supported and may lead to storage domain corruption.

	Bug-Url: https://bugzilla.redhat.com/1793207

	mpathconfig: check user_friendly_names is disabled
	Check that user_friendly_names is disabled in the
	multipathd local configuration.

	Take the parsed multipath configuration provided
	by mpathconf._parse_conf and loop through it.
	If user_friendly_names is enabled in any section
	of the configuration, collect the offending section
	so that it can be printed.

	mpathconfig: parse multipathd config
	Add a parser function to mpathconf module
	that can take an input iterator string with the
	multipath configuration format and create an
	iterable data structure with it.

	This will allow other functions and/or modules
	to have the multipath configuration accesible in
	memory, to inspect it, check it, etc.

2022-07-25  Albert Esteve  <aesteve@redhat.com>

	blockVolume: check parent tag and meta match
	blockVolume getParent method uses the volume tag
	PU_<id> to obtain the parent. However, this tag
	could be outdated. For example, when the host
	syncs a volume chain after removing a volume,
	it only updates the metadata. Afterwards,
	the SPM will update the volume tag once the
	volume is deleted.

	To ensure we do not return outdated parent UUID,
	we need to ensure that the parent UUID obtained
	from the parent tag, and the UUID in the volume
	metadata match. If they do not, we try to reload
	the volume tag first. If they still do not match
	after that, log a warning and use the metadata parent.

	Bug-Url: https://bugzilla.redhat.com/2103582

2022-07-25  Nir Soffer  <nsoffer@redhat.com>

	tests: Add script for releasing the SPM lease
	This is useful for testing the SPM watchdog and the panic function.

	Related-to: https://bugzilla.redhat.com/1961752

	panic: Fix panic when logging.shutdown hang
	We tried to add a timeout using signal.alarm() to make that vdsm will
	panic if shutting down logging hangs. However this does not work, since
	the alarm is delivered to the main thread instead of the thread setting
	the alarm.

	What actually happened after we added the alarm was:

	1. Panic thread: calls signal.alarm(10)
	2. Panic thread: calls logging.shutdown(), call hangs
	3. Main thread: crash with "RuntimeError: Alarm timeout"

	When we removed zombiereaper, we also removed the SIGALRM signal handler
	by mistake, so what happens in current code is:

	1. Panic thread: calls signal.alarm(10)
	2. Panic thread: calls logging.shutdown(), call hangs
	3. Unhandled SIGALRM terminates the process

	Replace the broken alarm with a thread waiting on an event.

	Fixes: 56ed0796259a (panic: Limit time spent shutting down logging)
	Fixes: c0a67a0a3094 (zombiereaper: Reap out last traces of it)

2022-07-22  Nir Soffer  <nsoffer@redhat.com>

	vdsmd: Document why we handle SIGUSR1 signal
	When using export domain we use safelease instead of sanlock. During
	fencing flows, it sends SIGUSR1 signal to vdsm to stop the SPM.

2022-07-21  Yedidyah Bar David  <didi@redhat.com>

	hooks: Add log-firmware
	This is a combination of the interface of log-console and the
	functionality of qemucmdline, hard-coded to log the firmware output to a
	log file. qemucmdline can't be used directly for HostedEngine, because
	we want a separate log file per VM, and qemucmdline does not have
	internally functionality to allow doing that, relying instead on the
	engine to supply correct params somehow. For HostedEngine, we can't rely
	on the engine, as it does not exist at this point.

2022-07-19  Milan Zamazal  <mzamazal@redhat.com>

	virt: Don’t retrieve VM external data until the VM is started
	VM external data, if not provided by Engine, may not be available
	initially and may be created only once the VM is started by libvirt.
	If its retrieval is attempted before the data is available, an error
	message is produced in the logs.  This error message is harmless but
	incorrect and it may confuse users or prevent distinguishing real
	errors from fake errors.

	Let’s not retrieve external data in all the VM initial states, not
	only MIGRATION_DESTINATION, to prevent the external data retrieval
	error.

2022-07-18  Albert Esteve  <aesteve@redhat.com>

	ci: update test-storage job name
	Update ci job test-storage to test-storage-user
	to be consistent with its test-storage-root counterpart.

2022-07-15  Albert Esteve  <aesteve@redhat.com>

	tox: remove storage environment
	Remove tox storage environment as it is not
	used in the CIs anymore. Consequently, remove
	also the associated make target 'tests-storage'.

	ci: split root and non-root storage tests
	Most of vdsm code run as vdsm:kvm, but vdsm tests run
	as root by default in the CI, causing several problems, e.g.:
	- Different behaviour, when run locally and the CI
	- Some tests are skipped as root

	We can separate the tests in two different jobs:
	- storage-tests: run test marked as "not root".
	  It still requires the container to be privileged
	  for setting up the storage.
	- storage-tests-root: run tests marked as "root".

	Fixes: #128

	makefile: tests-storage targets for user and root
	Add targets for running tox in the storage-user
	and storage-root environments.

	tox.ini: add user and root storage envs
	Add tox environments for storage user and root
	tests (i.e., storage-user and storage-root, respectively).
	These environments are a subset of the storage
	environment, that is split considering the root mark.

	This way, we can easily run tests marked as
	as a root user, and tests marked as non-root,
	separately.

	The new environments have their own different
	target coverage. The 'storage-root' environment
	sets a different coverage file and folder to avoid
	overwriting other environment's coverage reports
	with root ownership.

	storage: mark root all tests that req root
	Mark all storage tests that require root with
	the root label so that they get correctly picked
	or excluded up by the CI jobs.

2022-07-13  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.50.2.1

2022-07-13  Vojtech Juranek  <vjuranek@redhat.com>

	virt: switch domain to Defined once migrated VM is stopped
	When the VM is stopped on the source after migration, set VM status to
	Down and also switch domain to Defined state. The domain is not running
	and not all calls are available -- some would result in libvirt error.
	Defined domain will throw virdomain.NotConnectedError which should be
	handled by the caller.

	Added handling of this exception into Vm._destroyVm* methods.
	The methods are not called on not connected domain so this is only a
	preventive measure to be on the safe side.

	It would be more practical to change the domain to Defined much sooner,
	ideally in _handle_libvirt_domain_stopped(). That would avoid any delays
	especially from calling hooks. But postponing the change and doing it
	together with state change later makes the behavior more predictable and
	more consistent.

	Bug-Url: https://bugzilla.redhat.com/2000046

2022-07-06  rokkbert  <rokkbert@gmail.com>

	Clarified the development and tests md: - Completed the "git clone" example. - Added instructions to run "autogen.sh" before "make venv". - Removed optional "git clean" command because it was found to be unnecessary.   The Makefile gets regenerated anyway. - Added "Testing"-section to doc/development.md because tests are optional for   just building VDSM. - Moved optional "configure" to Advanced-section because it's very rare that   some parameters really need to be adjusted. - Fixed broken "tox" example where the search path wasn'n specified resulting   in too many tests being collected. - Gave test examples for different modules because some need the explicit   search path when passing arguments to pytest and some don't. - Added --no-cov to examples for partial tests because the coverage warning   is useless in those cases. - Used multiple arguments in PYTEST_ADDOPTS to show how it's done. - Simplified internal links to not be relative. - Added warning about switching VDSM to maintenance mode before upgrading it.
	Signed-off-by: Rob Linden rlinden@redhat.com

2022-07-05  Albert Esteve  <aesteve@redhat.com>

	docker: add vdsm user to image
	Create unprivileged vdsm user in the container image
	to enable running tests as non root.

	docker.make: fix push target
	Update destinarion url in the push target
	of the docker Makefile.

	docker: update copr ovirt-master repository
	Update the packages installed in the image to
	point to the correct latest releases copr repository.

	Remove qemu-img version requirement. Since we don't
	build in quay.io anymore, we do not need to specify
	it to trigger the build.

2022-07-04  Nir Soffer  <nsoffer@redhat.com>

	betterAsyncore: Fix busy loop after socket is closed
	After a peer closed the other size of the socket, socket.recv() returns
	b"" (empty bytes), but we wrongly checked for "" (empty str). Because we
	did not close the socket, the reactor entered a busy loop calling
	handle_read(). The busy loop probably ends when the reactor tries to
	send a heartbeat and detect that the socket was closed.

	Fixed by checking for b"" and returning b"" to the caller instead of "".

	Before this fix, a single vdsm-client run caused the `JsonRpc
	(StompReactor))` to consume about 100% cpu for 22 seconds:

	    vdsm-client Host getStorageRepoStats

	With this fix, we can run vdsm-client in a loop:

	    for n in $(seq 100); do
	        vdsm-client Host getStorageRepoStats > /dev/null;
	    done

	And the `JsonRpc (StompReactor))` thread consume less then 1% CPU.

	Bug-Url: https://bugzilla.redhat.com/2102678

	betterAsyncore: Always close on handle_close
	Asyncore calls handle_close() when the peer closed the other side of the
	socket. We must close the socket which removes it from asyncore map. If we
	don't do that, asyncore may enter a busy loop trying to read or wrtie from a
	closed socket, consuming 100% cpu for many seeconds.

	Bug-Url: https://bugzilla.redhat.com/2102678

	betterAsyncore: Log dispatcher lifetime events
	Log debug log when dispatcher is created and closed. This can help to debug
	issues after dispatcher is created or closed.

	Bug-Url: https://bugzilla.redhat.com/2102678

2022-07-04  Eitan Raviv  <eraviv@redhat.com>

	network: nmstate bridge options - test w/ multicast-router
	Multicast-router is understood by nmstate and therefore can be used in
	bridge-opts in tests.

2022-07-04  Harel Braha  <hbraha@redhat.com>

	network: use nmstate for bridge options
	the bridge options are currently implemented via sysfs,
	We want to use nmstate for bridge options.

2022-07-04  Eitan Raviv  <eraviv@redhat.com>

	network: require nmstate with bridge options
	Require nmstate > 1.2.1-3 that supports bridge options so that their
	handling can be removed from vdsm.

2022-06-29  Albert Esteve  <aesteve@redhat.com>

	mpathconf: use readline for config read
	In the multipath configuration revision check,
	the whole file is read with readlines(), but
	only the first two lines are used. If the file
	is empty, the list will be empty causing a posible
	KeyError that needs to be handled.

	All this complexity can be saved by using readline(),
	to read only the two lines needed. It already gives
	an empty string as a default value when the end of
	the file is reached, so we don't need to handle any
	exception, simplifying the code and improving
	readability.

	Finally, the lines read used rstrip for eliminating
	trailing newlines from the string. This can now be
	skipped since we use 'startswith' to check line,
	which exits as soon as the check is either
	positive or negative, and therefore it will never
	reach the newline character.

	storage.multipath: add multipathd reconfigure call
	Avoid _MULTIPATHD redefinition in tool.multipath,
	by moving reconfigure call to the storage.multipath,
	where other multipathd commands are already invoked.

	multipath: move multipathd configuration to mpathconf
	Move multipathd configuration write from the multipath
	configurator to mpathconfig module in storage.

	This makes the configurator module a wrapper for
	vdsm-tool, completely driven by mpathconf, where
	all the logic for checking and writing the
	configuration resides.

	multipath: move conf revision check to mpathconf
	Move the metada (revision) check that is done in the
	multipath configurator for isconfigured() out to the
	mpathconf module in the storage block.

	Add tests to verify the metadata checker in mpathconf.

2022-06-27  Albert Esteve  <aesteve@redhat.com>

	lvm: disable vg autoactivation in createVG
	Disable setautoactivation attribute of a VG in
	its creation in order to avoid unwanted VG activation.

	Add a test to verify that LVs in a VG cannot be
	autoactivated after creation.

	Fixes #229

2022-06-27  Nir Soffer  <nsoffer@redhat.com>

	contrib: Add git-sync tool
	The git-sync tool push the current branch to a remote. This is useful
	for testing your changes on a host or for running the tests on a test
	vm.

2022-06-23  Marcin Sobczyk  <msobczyk@redhat.com>

	hook-log-console: Don't require exact vdsm match
	There's no reason to require exactly the same vdsm version
	for the hook to work. It causes issues when installing on top of
	node/rhvh, so let's use a more relaxed requirement.

2022-06-22  Tomáš Golembiovský  <tgolembi@redhat.com>

	virt: replace CPU pinning for migrationCreate
	Before the migration is started by libvirt VDSM will first define the VM
	on destination host. For that it uses domain XML passed in
	VM.migrationCreate call and in it we need to replace existing CPU
	pinning with new CPU pinning. Otherwise the define call may fail if the
	pCPU indices are out of range.

	We already replace pinning in domain XML that we pass to libvirt when we
	initiate the migration (libvirt.VIR_MIGRATE_PARAM_DEST_XML parameter).
	This XML is migration XML and is different (different content, serves
	different purpose) from the XML we pass in VM.migrationCreate. We have
	to replace pinning in both separately.

	Note that in VM.migrationCreate call we pass two XMLs -- one in `xml`
	argument and one in `_srcDomXML`. We replace pinning only in
	`_srcDomXML` because that is the one passed to define call on
	destination. The choice not to replace pinning in `xml` is arbitrary,
	but the rationale may be to avoid doing unnecessary work or to treat
	`xml` as initial configuration from engine.

	Bug-Url: https://bugzilla.redhat.com/2099321

	virt: raise, not return, exception in cpumanagement._siblings
	Bug-Url: https://bugzilla.redhat.com/2099321

2022-06-22  Nir Soffer  <nsoffer@redhat.com>

	contrib/target: Support non-default tpgt
	New targets always use tpgt=1, which makes it hard to simulate issues
	with server that use unique tpgt tags (e.g. 1034). Add --tpgt argument
	to allow creating targets with custom tpgt.

	Here is an example create a target with custom tpgt:

	    # ./target create 05 --lun-count 2 --cache --portal 192.168.122.34 --portal 192.168.122.35 --tpgt 50

	    Creating target
	      target_name:   05
	      target_iqn:    iqn.2003-01.org.alpine.05
	      target_dir:    /target/05
	      lun_count:     2
	      lun_size:      100 GiB
	      cache:         True
	      exists:        False
	      portals:       192.168.122.34:3260, 192.168.122.35:3260
	      tpgt:          50

	    Create target? [N/y]: y

	This creates a target with:

	    alpine:~# targetcli ls /iscsi/iqn.2003-01.org.alpine.05
	    o- iqn.2003-01.org.alpine.05 ............................ [TPGs: 1]
	      o- tpg50 .................................... [gen-acls, no-auth]
	        o- acls ............................................. [ACLs: 0]
	        o- luns ............................................. [LUNs: 2]
	        | o- lun0 ... [fileio/05-00 (/target/05/00) (default_tg_pt_gp)]
	        | o- lun1 ... [fileio/05-01 (/target/05/01) (default_tg_pt_gp)]
	        o- portals ....................................... [Portals: 2]
	          o- 192.168.122.34:3260 ................................. [OK]
	          o- 192.168.122.35:3260 ................................. [OK]

	Related-to: https://bugzilla.redhat.com/2097614

	iscsiadm: Consider session exists as success
	If logging in to target fail with "session exist" error, consider the
	login successful.

	The issue happens when a host has duplicate portals with the same
	address, port, iface:

	    # iscsiadm -m node -P1
	    Target: iqn.2003-01.org.alpine.01
	            Portal: 192.168.122.34:3260,1
	                    Iface Name: default
	            Portal: 192.168.122.34:3260,2
	                    Iface Name: default

	iscsiadm tries to login to both portals, but it cannot create more than
	one session for the same address, port, iface, so it logs in the one of
	the portals and complain about the second:

	    # iscsiadm -m node -T iqn.2003-01.org.alpine.01 -p 192.168.122.34:3260,1 -l
	    Logging in to [iface: default, target: iqn.2003-01.org.alpine.01, portal: 192.168.122.34,3260]
	    Logging in to [iface: default, target: iqn.2003-01.org.alpine.01, portal: 192.168.122.34,3260]
	    Login to [iface: default, target: iqn.2003-01.org.alpine.01, portal: 192.168.122.34,3260] successful.
	    iscsiadm: Could not login to [iface: default, target: iqn.2003-01.org.alpine.01, portal: 192.168.122.34,3260].
	    iscsiadm: initiator reported error (15 - session exists)
	    iscsiadm: Could not log into all portals

	After the failure, we are left with connected session:

	    # iscsiadm -m session
	    tcp: [27] 192.168.122.34:3260,1 iqn.2003-01.org.alpine.01 (non-flash)

	So while the host iscsi db needs fixing, the host can function normally
	and we don't want to make it non-operational.

	To help users fix the bad configuration, we log a new warning:

	2022-06-20 18:47:23,308+0300 WARN  (iscsi-login/2) [storage.iscsiadm] Duplicate portals
	for target iqn.2003-01.org.alpine.01 iface default portal 192.168.122.34:3260,1:
	Command ... failed with rc=15 out=... err=b'iscsiadm: Could not login to [iface: default,
	target: iqn.2003-01.org.alpine.01, portal: 192.168.122.34,3260].\niscsiadm: initiator
	reported error (15 - session exists)\niscsiadm: Could not log into all portals\n'

	Bug-Url: https://bugzilla.redhat.com/2097614

2022-06-17  Milan Zamazal  <mzamazal@redhat.com>

	New development cycle: 4.5.2

2022-06-16  Nir Soffer  <nsoffer@redhat.com>

	lvmlocal.conf: Disable multipath_wwids_file usage
	In RHEL 8.6, LVM added detection of multipath components using the
	multipath wwids file (/etc/multipath/wwids). Unfortunately this
	incorrectly detect blacklist multipath devices. This causes
	vgimportdevices to skip the devices when adding to the devices file, and
	the result is host failing to boot.

	Since we always use lvm devices or filter, this feature is not helpful
	for our use case, and can be safely disabled.

	Bug-Url: https://bugzilla.redhat.com/2090169
	Related-to: https://bugzilla.redhat.com/2076262

2022-06-15  Albert Esteve  <aesteve@redhat.com>

	storageServer: early return if all conns failed
	If all iscsi connections failed during preparation
	with setup_node(), it does not make sense to continue,
	since logins will be empty. Otherwise, it leads to an
	unrelated, non-useful error when running tmap,
	since max_workers will also be 0.

	We can return early if logins is empty after preparing
	all connections. Errors will not go unnoticed since they
	will be printed before.

	storageServer: fix setup_node error log
	When seting up an iscsi node for a connection fails,
	the error logged had a typo.

	Fix and improve it by adding the reason of the error
	to the logs.

	Fixes: #252

2022-06-15  Yedidyah Bar David  <didi@redhat.com>

	hooks: Add log_console

2022-06-15  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.50.1.3

2022-06-14  Albert Esteve  <aesteve@redhat.com>

	lvm: remove outdated warning
	This log is left over from times we retried the lvm command
	before failing. Now we run once with specific lvm devices, and again
	with all devices, so this log is not very helpful and can be removed.

	Bug-Url: https://bugzilla.redhat.com/2048545

	lvm: updated lv args to snake_case
	Change variables and parameters in recently updated LV
	methods (e.g., reloadlvs, getlv) to conform to snake_case.

	lvm: reloadlv raise LVDoesNotExistError
	Move the responsibility of raising LogicalVolumeDoesNotExistError
	to LVMCache class to make it consistent with VG error
	handling.

	Avoid raising if no Volumes are found when getting all
	the LVs of a VG, as this is wrong. Only raise when looking
	for a specific LV and not finding it. An empty VG is a valid
	case (e.g., a recently created VG), and is the caller who
	should decide whether finding no LVs is valid or not.

	Fixes: #211

	lvm: split reload LVs for single and all LVs
	Separating the LVMCache._reloadlvs into two
	methods for single LV and all LV in a VG enables
	having different behaviours in both cases, in
	preparation for LVM error handling.
	Also, this separation makes the LVs API
	consistent with VGs API.

	lvm: split LV getter for single or all LVs
	Currently lvm.getLV is multipurposed for single
	and multi LV lookup, which is also replicated
	in LVMCache.getLv method. This makes it difficult
	to handle these two cases properly and raise
	if an error occurs looking up a single LV.

	Split lvm.getLV function into single LV getter,
	that requires an LV name, and a getter for all
	the LVs in a VG, which will only require the VG
	name.

	Replicate the split in LVMCache LV getters.

	lvm: extract helper for realoading lvs
	Refactor LVMCache._reloadlvs method similarly
	to how other reload methods in the class are
	structured, doing separately:
	- Run the LVM command
	- If LVM command fails, try to reload staled LVs
	- Otherwise, update cached LVs

2022-06-14  Tomáš Golembiovský  <tgolembi@redhat.com>

	virt: Replace CPU pinning when resuming snapshot with memory
	When resuming a live snapshot with memory the domain XML contains CPU
	pinning that has been used at the time the snapshot was taken. Such
	pinning is likely to be invalid when the snapshot is resumed. Engine
	passes new CPU pinning configuration in API and it should be honored.

	virt: fix graphics configuration when resuming from hibernation
	Somehow libvirt drops `passwdValidUntil` attribute from domain XML when
	when we pass XML that has no `passwd` attribute to restoreFlags(). Same
	happens when restoring from snapshot with memory and we can use the same
	method to mitigate the problem.

	An alternative would be to pass `VIR_DOMAIN_SAVE_IMAGE_XML_SECURE` flag
	to `saveImageGetXMLDesc()` call. That way libvirt would not hide the
	`passwd` from us. But since our passwords to graphics console are
	temporary this does not add any value. Also we would leak the console
	password to the logs which is not favorable even though the password is
	likely to be invalid at the time of resume.

2022-06-09  Eitan Raviv  <eraviv@redhat.com>

	network: nmstate-plugin-ovsdb dependency not needed on RHEL 9
	From RHEL 9.0 onwards nmstate-plugin-ovsdb is part of nmstate so it is
	no longer required as a dependency of vdsm.

	See also: https://github.com/oVirt/vdsm/issues/219

	Bug-Url: https://bugzilla.redhat.com/2091581

2022-06-08  Nir Soffer  <nsoffer@redhat.com>

	check: Fix fd leak when child process fails
	When using Popen.communicate() the process pipes are closed when the
	child closes its end of the pipe. But in the check module we run
	multiple processes in the same thread, and we read from the pipe using
	asynchronous I/O. In this case we never close the child process pipe.
	This should not be an issue, since the pipe should be closed
	automatically when we remove the reference to the process, and we never
	notice any file descriptor leak in this code.

	However we see clear leak in the case when a storage domain becomes
	inaccessible. Strangely, only failing checks from inaccessible storage
	domains leak, and we see very clear leak of one file descriptor every 10
	seconds:

	    for i in $(seq 3600); do
	        echo "$(date --rfc-3339=seconds), $(readlink /proc/$(pidof -x vdsmd)/fd/* | wc -l)"
	        sleep 10
	    done
	    ...
	    2022-06-07 22:52:14+03:00, 374
	    2022-06-07 22:52:24+03:00, 375
	    2022-06-07 22:52:34+03:00, 376
	    2022-06-07 22:52:44+03:00, 377
	    2022-06-07 22:52:54+03:00, 378
	    2022-06-07 22:53:04+03:00, 379
	    2022-06-07 22:53:14+03:00, 379
	    2022-06-07 22:53:24+03:00, 380
	    2022-06-07 22:53:34+03:00, 381
	    2022-06-07 22:53:44+03:00, 384
	    2022-06-07 22:53:54+03:00, 383
	    2022-06-07 22:54:04+03:00, 384
	    2022-06-07 22:54:14+03:00, 385
	    2022-06-07 22:54:24+03:00, 386
	    2022-06-07 22:54:34+03:00, 387
	    2022-06-07 22:54:44+03:00, 388
	    2022-06-07 22:54:54+03:00, 388
	    2022-06-07 22:55:04+03:00, 389

	I cannot explain why this happen only when the process fail and we read
	an error message from stderr.

	Fix by explicitly closing the child process stderr pipe when we finish
	reading from it.

	Bug-Url: https://bugzilla.redhat.com/2075795

	tests: Test that DirectioChecker does not leak fds
	We see file descriptor leak when check fails. Add failing tests for
	missing path and working path reproducing the leak.

	In the environment reproducing this, we don't see a leak for successful
	checks, only for failing checks. In the tests we see leak in both cases.

	Bug-Url: https://bugzilla.redhat.com/2075795

2022-06-08  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.50.1.2

2022-06-07  Tomáš Golembiovský  <tgolembi@redhat.com>

	virt: use proper CPU pinning when resuming hibernated VM
	When hibernating a VM we store also the CPU pinning. This pinning needs
	to be invalidated when resuming the VM because in most cases (except
	when manual pinning is involved) the pinning is no longer usable either
	because shared pool needs to be adapted or dedicated CPUs have changed.

	New pinning passed from engine needs to be used instead.

	Domain XML that is used to restore the VM is loaded from libvirt's
	hibernation image which replaces XML passed from API call and that goes
	through init. This should not lead to any change in behavior. While we
	used the passed XML to create the domain it has been always immediately
	replaced by libvirt in the resume call. That means if we are doing any
	changes to the XML during early init such changes were always lost and
	will still remain lost. It may be tempting to pass domain XML from
	engine or from our pickled data to `resumeFlags()`. Neither is possible
	because our domain XML is not migration XML. Doing so would end in error
	because libvirt would fail to match the guest specific portion with the
	stored domain XML.

	Bug-Url: https://bugzilla.redhat.com/2083302

	virt: refactor vcpupin replacement on migration
	Bug-Url: https://bugzilla.redhat.com/2083302

2022-06-07  Eitan Raviv  <eraviv@redhat.com>

	network: centos-9 disable functional tests
	Disable functional tests until all related issues are solved.

2022-06-04  michalskrivanek  <michal.skrivanek@redhat.com>

	 Mark temp repo directory as safe for COPR (#232)
	* Mark temp repo directory as safe for COPR
	
	When building from COPR the project is cloned into temporary directory,
	which is not owned by current user. From git 2.35.2 such directory needs
	to be marked as safe inn order git commands work correctly.

2022-06-03  Nir Soffer  <nsoffer@redhat.com>

	lvmlocal.conf: Bump revision to fix upgrade
	When we change the content of the lvmlocal.conf file, we must bump the
	revision. This is used by the lvm configurator to upgrade the file when
	running `vdsm-tool configure` during upgrade.

	Fixes: 4ec21db8ae83 (lvmlocal.conf: remove global section)

2022-06-03  Albert Esteve  <aesteve@redhat.com>

	lvmlocal.conf: remove global section
	The event_activation option set to 0 in the global
	section of the lvmlocal.conf is causing el9nodes
	to fail to boot after being installed by the engine.

	LVM removed support for disabling event_activation in rhel9.
	Existing systems with event_activation=0 that update lvm
	will no longer have static autoactivation services, and
	may fail to boot (if an LV required by fstab was not
	activated by the initrd).

	Removing the option solves the issue. The option was
	not correct for el8 systems either, it is discouraged
	to use since it is not reliable (may appear to work
	based on timing of device attachement). Thus, this
	option can be safely removed.

	The global section is left with no options whatsoever,
	so it can be completely removed alongside the
	event_activation configuration.

	Related-to: https://bugzilla.redhat.com/2038183

2022-06-02  Albert Esteve  <aesteve@redhat.com>

	development.md: document upgrade target
	Update development documentation with the new `upgrade`
	make target available.

	makefile: add upgrade target
	Add 'upgrade' target to makefile, to upgrade
	the installation based on RPMs found in the
	project-specific RPM top directory.

	With the new target, to build rpms and upgrade the
	system with them we can now do:

	    make rpm upgrade

	Or we can simply upgrade with the already built rpms:

	    make upgrade

	makefile: define project-specific rpm dir
	Make targets that invoke rpmbuild command to build
	into a vdsm-specific rpmbuild directory.

	Add a clean-local target so that the folder gets
	cleaned with:

	    make clean

	Add the new rpmbuild folder to gitignore.
	Fix the destination also in the CIs rpm script.

	storage.exception: LVMError in LVDoesNotExistError
	Make LogicalVolumeDoesNotExistsError exception to inherit from
	_HoldingLVMCommandError exception, so that we allow logging the
	error and make the exception more informative.

	storage.exception: add _HoldingLVMCommandError
	Add _HoldingLVMCommandError to act as a parent class for all
	StorageExceptions that occur as a consecuence of an LVM
	command error. This class is not meant to be directly
	instantiated.

	The exception holds an Optional LVMCommandError, validates
	the parameter in the constructor, and overwrites the `value`
	property for logging information consistently.

2022-06-02  Nir Soffer  <nsoffer@redhat.com>

	volume: Requires create for qcow2 image without a parent
	Previously we requires create=True only for qcow2 images using
	compat=0.10 that do not support zero clusters. This issue is not
	relevant since RHEL 8.4.

	Testing convert of big qcow2 image (8 TiB) show that writing zero
	clusters is not fast enough for big images. When qemu-img convert
	creates the target image, it knows that the image is zeroed, so it can
	skip unallocated areas instead of zeroing them.

	Testing copy of 8 TiB empty image shows that without -n the copy is 700
	times faster:

	    # time qemu-img convert -f qcow2 -O qcow2 -t none -T none -n src dst

	    real    2m1.839s
	    user    0m10.383s
	    sys     0m7.563s

	    # time qemu-img convert -f qcow2 -O qcow2 -t none -T none src dst

	    real    0m0.171s
	    user    0m0.111s
	    sys     0m0.011s

	With real images we see much lower speedup. Here is an example copy with
	a 8 TiB image created with virt-builder:

	    # virt-builder fedora-35 \
	        --output fedora-35-8t.raw \
	        --hostname=fedora-35 \
	        --ssh-inject=root \
	        --root-password=password:root \
	        --selinux-relabel \
	        --install=qemu-guest-agent \
	        --size=8192G
	        ...
	                       Output file: /var/tmp/fedora-35-8t.raw
	                       Output size: 8192.0G
	                     Output format: raw
	                Total usable space: 8192.0G
	                        Free space: 8133.1G (99%)

	    # qemu-img info /var/tmp/fedora-35-8t.raw
	    image: /var/tmp/fedora-35-8t.raw
	    file format: raw
	    virtual size: 8 TiB (8796093022208 bytes)
	    disk size: 1.5 GiB

	    # qemu-img measure -O qcow2 /var/tmp/fedora-35-8t.raw
	    required size: 3228303360
	    fully allocated size: 8797435527168

	The image was converted to qcow2 format to block volume:

	    # time qemu-img convert -f raw -O qcow2 -t none -T none /var/tmp/fedora-35-8t.raw src

	    real 0m35.979s
	    user 0m3.303s
	    sys 0m2.179s

	In qcow2 format the image uses 2.2 GiB:

	    # qemu-img check src
	    No errors were found on the image.
	    28746/134217728 = 0.02% allocated, 0.00% fragmented, 0.00% compressed clusters
	    Image end offset: 2314469376

	Creating new qcow2 image on destination block volume:

	    # qemu-img create -f qcow2 dst 8t
	    Formatting 'dst', fmt=qcow2 cluster_size=65536 extended_l2=off
	    compression_type=zlib size=8796093022208 lazy_refcounts=off
	    refcount_bits=16

	Copying image from block volume to block volume with -n:

	    # time qemu-img convert -f qcow2 -O qcow2 -t none -T none -n src dst

	    real 3m7.006s
	    user 0m18.817s
	    sys 0m11.929s

	Same copy without -n (creating new qcow2 image):

	    # time qemu-img convert -f qcow2 -O qcow2 -t none -T none src dst

	    real 0m59.822s
	    user 0m7.632s
	    sys 0m5.197s

	Change Volume.requires_create() to return True for qcow2 images without
	a parent.

	Testing creating VM from template with the image tested above shows that
	copy time decreased from 280 to 95 seconds (2.9 faster).

	Fixes: #221
	Fixes: 2bbe66274524 (volume: Be more careful with create=False)

2022-06-02  Milan Zamazal  <mzamazal@redhat.com>

	virt: Fix long VNC passwords in vm_libvirt_hook.py
	Newer versions of libvirt accept VNC passwords of the maximum length 8,
	because QEMU uses only the first 8 characters anyway.  We have already
	fixed the password length in Engine but if we migrate VMs created by
	older Engines they may fail to start on the destination due to the
	long VNC password.

	Let’s make those VMs migratable to newer hosts by fixing the password
	in the libvirt hook.  We can simply remove the extra unused characters
	from the password to make libvirt happy.

	Bug-Url: https://bugzilla.redhat.com/2090156

	tests: virt: Switch to pytest in vm_libvirt_hook_test.py

	virt: Remove Python 2 support from vm_libvirt_hook.py and its test
	Bug-Url: https://bugzilla.redhat.com/2090156

2022-06-01  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.50.1.1

	virt: Prolong migration listener timeout by disk preparation
	When a migrating VM is being prepared on the destination host, some
	disks require modifying and reloading udev rules on the host.  This
	can take a significant amount of time and if the VM has a lot of such
	disks then a migration preparation timeout may occur.

	The right solution would be to reload the modified rules only once for
	all the disks.  But this would require some more work to implement.
	At the moment, let’s make a workaround prolonging the migration
	preparation timeout per each such a disk, i.e. a disk with
	/dev/mapper/… source paths.  But not beyond a newly configurable
	max_migration_listener_timeout.

	We add 2 seconds per drive by default.  This is an arbitrarily
	selected value that would work well in an environment where the
	problem was observed.  If this value doesn’t work well enough in some
	environment, it can be overridden in another newly introduced
	configuration value migration_listener_prepare_disk_timeout.

	Bug-Url: https://bugzilla.redhat.com/2055905

	virt: Use snake case in incoming migration identifiers

2022-05-30  Albert Esteve  <aesteve@redhat.com>

	lvm: updated pv args to snake_case
	Change variables and parameters in recently updated
	PV methods (e.g., reloadpvs, getPV) to conform to snake_case.

	lvm: reloadpv raise InaccessiblePhyDev
	Currently the PV getter raises InaccessiblePhyDev
	if None is detectedi, which is the default return value
	of the PVs dictionary when the key does not exist.

	Move the raise to the reload method, so that we raise
	as soon as the missing PV is detected and we leave
	the responsibility of raising always to the LVMCache,
	consistent with how VG errors are handled.

	lvm: split reload pvs for single and multi PVs
	Separating the LVMCache._reloadpvs into two
	methods for single PV and multi PV enables
	having different behaviours in both cases.

	Also, this separation makes the PV handling
	consistent with how VGs are handled, and avoid
	overloading the method for different types
	received in the parameter.

	lvm: extract helper for realoading pvs
	Refactor LVMCache._reloadpvs similarly to how
	_reloadvgs is structured, with different methods for:
	- Running the LVM command
	- Update stale PVs after LVM errors
	- Update the cache based on the pvs command output
	  if no errors occurred

	storage.exception: add error to InaccessiblePhysDev
	Add 'error' attribute to InaccessiblePhysDev exception,
	to allow logging the error and make the exception more informative.

2022-05-26  Nir Soffer  <nsoffer@redhat.com>

	livemerge: Fix over extend
	When extending the base volume before merge, we use a dumb calculation
	extending the base volume by top_size + chunk_size. This allocate way
	too much space which is typically not needed. For active layer merge,
	there is no way to reduce the volume after the merge without shutting
	down the VM. The result is growing the active volume on every merge,
	until it consumes the maximum size.

	Fix the issue by measuring the sub-chain from top to base before the
	extend. This give the exact size needed to commit the top volume into
	the base volume, including the size required for the bitmaps that may be
	in the top and base volume.

	In the case of active layer merge, this measurement is a heuristic,
	since the guest can write data during the measurement, or later during
	the merge. We add one chunk of free space to minimize the chance of
	pausing a VM during merge. The only way to prevent pausing during merge
	is to monitor base volume block threshold during the merge.  This was
	not possible in the past, but can be done with current libvirt, but vdsm
	thin provisioning code is not ready for this yet.

	For internal merge, measuring is exact, and there is no need to leave
	free space in the base volume since the top volume is read only.

	Fixes: #188
	Related-to: https://bugzilla.redhat.com/1993235

	vm: Simplify extend_volume()
	Previously we accepted the current and maximum size of the volume, and
	computed the next volume size, including one chunk of free space, and
	ensuring that the size is limited by the volume capacity and qcow2
	overhead.

	This works for auto extending thin volume and for pre extend during
	replication, but it does not work correctly for internal live merge,
	when we don't want to add one chunk of free space. It also ugly that in
	active layer merge we have to lie and call with the maximum required
	size as the "current" size.

	Change extend_volume() to accept the new size, moving the responsibility
	to the caller. The 3 callers were update to compute the size using
	drive.getNextVolumeSize().

	vm: Add measure() storage helper
	The new method call HSM.measure() to measure the required size for a sub
	chain. This will be used to measure the required size before live merge.

	Fake IRS emulate measure by returning fake values from a dict. Tests
	will add entries to this dict to simulate guest writes.

	hsm: Fix dest_format type
	The docstring wrongly describe it as str, but unfortunately this is an
	integer.

2022-05-26  Benny Zlotnik  <bzlotnik@redhat.com>

	managedvolume: add support for lightos
	Add lightos[1] driver to supported drivers as it now works.

	[1] https://docs.openstack.org/cinder/latest/configuration/block-storage/drivers/lightbits-lightos-driver.html

	managedvolume: extract supported drivers
	Move supported drivers hardcoded tuple to a variable to make it easier
	to add new drivers in the future.

2022-05-25  Benny Zlotnik  <bzlotnik@redhat.com>

	clientIF: use path as-is if drive is managed
	Currently we check for the presence of GUID or RBD metadata keys when
	constructing the path for the VM.

	Since we now start to use the vdsm managed link we can just use it as-is
	instead.

	udev: adjust udev rule for managed volumes
	udev rule file name will now use the format for managed volumes:
	99-vdsm-managed_{sd_id}_{vol_id}.rules

	The contect of the rule will vary depending on whether we use a link or
	the actual device, for example:

	  $ cat /etc/udev/rules.d/99-vdsm-managed_fa728e08-2db4-4016-bfbb-bd75f2ee1735_876fa465-d92b-4070-833d-c5ec4331ceb5.rules
	  SYMLINK=="rbd/volumes/volume-876fa465-d92b-4070-833d-c5ec4331ceb5", RUN+="/usr/bin/chown vdsm:qemu $env{DEVNAME}"

	  $ cat /etc/udev/rules.d/99-vdsm-managed_fc3b802e-cbc5-4896-995d-87f4b25a1a68_0d850fc9-9dfe-44ee-a4f6-a209d1f4bf7b.rules
	  SYMLINK=="mapper/20024f4005854000a", RUN+="/usr/bin/chown vdsm:qemu $env{DEVNAME}"

	managedvolume: create link for MBS device
	The current code for attaching MBS devices tries to construct stable
	path. This works for RBD and mpath devices, however doesn't scale for
	drivers that do not fall in this category, for instance, NVMe/TCP
	drivers.

	This patch create a link for the path returned by os-brick, this link
	will be managed by vdsm and will be used by engine as the path to run
	VMs.

	Example:
	lrwxrwxrwx. 1 vdsm kvm 60 May  3 10:21 /var/run/vdsm/managedvolumes/fa728e08-2db4-4016-bfbb-bd75f2ee1735_876fa465-d92b-4070-833d-c5ec4331ceb5 -> /dev/rbd/volumes/volume-876fa465-d92b-4070-833d-c5ec4331ceb5

	managedvolume: add sd_id to detach_volume
	Add sd_id to managedvolume.detach_volume to ensure we can remove the
	correct volume in case multiple volumes with the same id are present
	(live storage migration for example).

	managedvolume: add sd_id to attach_volume
	Add sd_id to attach_volume to ensure it will be possible to uniquely
	identify a volume if two volumes with the same id are attached to the
	same host (during live storage migration for example).

2022-05-25  Albert Esteve  <aesteve@redhat.com>

	lvm: support only reloading all LVs
	LVMCache._reloadlvs was coded to support reloading
	a single LV, a subset of all LVs in the VG, or all
	of them. However, it is only used for reloading all,
	since the cost is roughly the same.

	Support only the case that is used, where all LVs
	are to be reloaded avoids the unnecessary complexity,
	and overloading the function signature with different
	parameter types.

	lvm: remove renameLV
	Remove lvm.renameLV since it is not used anymore.

	Fixes: 780d1f72c57add2a7ba27e56662d47dd16185584

2022-05-24  Albert Esteve  <aesteve@redhat.com>

	tox.ini: allow newer black version to match click
	Latest click versions are not compatible with black
	version used by tox (21.6b0). To fix it we either
	have to move to a more recent version of black, which
	currently breaks the linter in some tests, or we downgrade
	the version of click to 8.0.2.

	Update black version to a more recent one (22.3.0) and fix
	linting issues that come with the change.

2022-05-23  Tomáš Golembiovský  <tgolembi@redhat.com>

	virt: store some guest stats during migration
	We used to transfer some guest statistics to destination host before
	migration. This disappeared from the code for non-obvious reasons while
	moving to domain XML on engine (commit 88c6922f). Stored statistics
	where username (logged in user), guestFQDN (hostname) and guestIPs
	(network addresses). This commit restores transfer of username and
	guestFQDN. Instead of guestIPs we transfer netIfaces, which provides
	more detailed information about interfaces, and also because guestIPs is
	not used in engine since 4.2.

	Bug-Url: https://bugzilla.redhat.com/1853897

2022-05-23  Albert Esteve  <aesteve@redhat.com>

	development.md: update make venv
	Update #environment-setup section to use the new
	make 'venv' target instead of the command list
	to create a new Python virtual environment.

	Makefile: add venv target
	Add target to create python venv in the main Makefile.

2022-05-18  Albert Esteve  <aesteve@redhat.com>

	blockSD: check can_reduce before reduceVolume
	There is not safety check before reducing a volume
	in the StorageDomain class. This could lead to
	wrong reductions in case of, e.g., raw or preallocated
	volumes.

	Use vol.can_reduce method before reducing the volume.

	Fixes: #185

	merge: check can_reduce in a cold merge
	When finalizing a merge, only the format (cow)
	and the optimal_size are considered to discern when to
	reduce a volume. However, calculating the optimal size can
	be avoided in the case of a preallocated block volume, as it
	cannot be reduced.

	Use the volume can_reduce check before obtaining the optimal
	size, and before reducing the volume, so that we consider the
	type of a volume before reducing it.

	Bug-Url: https://bugzilla.redhat.com/2081493

	lvm: improve error handling for single vg getter
	Currently, if we try to reload a non existing VG that is not
	in the cache, we will not get any error, it will raise an
	exception without any context of the actual reason of the fail.

	Furthermore, if the LVM command fails, it might go unnoticed,
	or trigger a single uninformative warning without the reason
	why it might have failed.

	Make _reload_single_vg to raise an exception if the LVM command
	failed and there is no output or the VG is stale (not updated).
	Otherwise it returns the single VG that has been updated.
	LVMCache.getVg() will propagate the exception.

	lvm: split reloadvgs for single and multi VGs
	LVMCache._reloadvgs is used the same way both when
	reloading a single VG, and when reloading a set of VGs
	or all VGs in the cache.

	This makes it difficult to have tailored handling (e.g., error
	handling) of reloading VGs for those cases. Also forces
	the method to be overloaded, supporting different types
	from the parameter and increasing complexity.

	Specialize _reloadvgs for single VG invocation to
	allow different behaviour and avoid function overloading.

	lvm: extract helper for reloading vgs
	LVMCache._reloadvgs has two main purposes:
	- Build and run an LVM command
	- Update the cache based on the LVM command output

	Resulting in a relatively big method that could be
	refactored to follow the single-responsibility principle.

	Extract update VGs and update stale VGs helpers
	from _reloadvgs.

	lvm: simplify listPVnames
	listPVnames first tries to get the VG directly from
	the cache in lvminfo, without using the getter, thus
	ignoring the case of a stale or unreadable VG.

	Then tries to catch those cases with a except block
	catching an AttributeError or a KeyError, and recover
	from the issue by calling getVG.

	However, we should simply use getVG from the get-go,
	which already handles missing, stale or unreadable VG,
	and tries to reload it from LVM. If it fails it will
	raise a VolumeGroupDoesNotExist.

	Also, VGBlockSizes methods were checking missing PVs
	in a VG after calling listPVnames, and raising a
	VolumeGroupDoesNotExist if empty.

	But there is no such thing as a VG without PVs (pv_name),
	We either found a VG with a valid pv_name, or listPVnames
	rises an exception, and therefore we can remove that part.

	lvm: rename VGBlockSizes method's param to vg_name
	In lvm, the VG block size getter and checker have
	a vgUUID parameter that is wrongly named. It should be
	either `sdUUID` or `vgName`. This made the code misleading
	on the way it was suposed to be used.

	storage.exception: add error to VolumeGroupDoesNotExist
	Add 'vg_name', 'vg_uuid', and 'error' attributes to VolumeGroupDoesNotExist
	exception, to be able to properly log the error and make it informative.

2022-05-18  Ales Musil  <amusil@redhat.com>

	net: Show IB interface in capabilities
	The IB interface was filtered as it was not considered
	to be a NIC like interface. Add the ipoib type into
	NIC like interfaces.

	Bug-Url: https://bugzilla.redhat.com/2081359

2022-05-17  Albert Esteve  <aesteve@redhat.com>

	blockSD: avoid argument shadowing in validatePVs
	BlockStorageDomainManifest._validatePVsPartOfVG receives
	a 'pv' argument that is shadowed in a dict comprehension
	in the first line of the method. It might be safe as it
	is contained in the comprehension only, but shadowing
	should be avoided anyway.

	Also, 'pv' is wrong in the comprehension, since listPVNames
	will return a PV path (not a PV namedtuple). So changing
	'pv' for 'path' in the comprehension will both fix the
	argument shadowing and use proper informative naming.

2022-05-16  Nir Soffer  <nsoffer@redhat.com>

	merge: Fix reducing after merge
	When reducing volume after an active merge, we reduced the base volume
	too much, leaving no free space. When starting a VM after the merge, the
	VM is very likely to pause with ENOSPC when the guest is writing to the
	merged disk.

	Fix by calculating the optimal size correctly, using the new as_leaf
	argument. With this fix, the leaf volume after a merge always has one
	chunk of free space (2.5g), and it behave like a new volume or snapshot.

	Fixes: #179

	blockVolume: Calculate optimal size as leaf
	When finalizing a merge we calculate the optimal size for the base
	volume in the wrong way, since optimal_size() is correct only after the
	top volume was deleted, and the base volume becomes a leaf.

	Add as_leaf argument to optimal_size(). When it is True, we calculate
	the optimal size as if the volume is a leaf. Finalize merge will use
	this to calculate correct size when finalizing an active merge.

2022-05-16  Albert Esteve  <aesteve@redhat.com>

	volume: add can_reduce wrapper for Volume class
	Add a 'can_reduce' wrapper method in the Volume class,
	calling the VolumeManifest.can_reduce method, so that
	it can be invoked directly from both types.

	volume: add 'can_reduce' method
	Currently there is not check performed from within
	the volume class before a reduce is attempted. This may
	result in reduce calls with wrong volume type (i.e., preallocated).

	In preparation, provide a 'can_reduce' method in
	the volume base class, so that it could be called from the
	appropiate place before a volume is reduced.

2022-05-16  Nir Soffer  <nsoffer@redhat.com>

	blockVolume: Improve cow_optimal_size() comments
	Make the comment about aligning the optimal size to class align_size
	more clear.

	merge: Fix over allocation when preparing
	When preparing a merge, we calculate the new size of the base volume in
	a dumb way:

	    new_size = base_size + top_size + chunk_size

	In the best case when merging a new empty snapshot, this allocates 2
	extra chunks (5 GiB). In the worst case, when top is big, but all the
	clusters in the top volume exist in the base volume, this can allocate
	huge amount which is not needed.

	The extra allocation is fixed after the merge, when we shrink the volume
	to optimal size.

	The extra allocation during the merge can cause:
	- Failure to merge if the required allocation is more then the free
	  space in the storage domain.
	- VM can pause without a way to resume them if there is no space to
	  extend thin disks
	- Other storage operations may fail because there is not enough free
	  space in the storage domain.

	Fix the over allocation by measuring the required size for the merge,
	and calculating the optimal size for the volume based on the required
	size.

	Since we calculate the optimal size before the merge, we can skip the
	shrinking to optimal size at the end of the merge if the volume size
	after the merge is optimal. This makes the merge more robust and little
	faster.

	Since current merge tests use fake block volumes, we must mock
	qemuimg.measure() for testing leaf volumes. This should be improved
	later by using a real block storage domain.

	Fixes: #134

	tests: Fix block volume tests fake chain
	The tests created an invalid chain when the base volume is a leaf
	volume. This error was hidden since the code does optimize for internal
	merge, and always adds one chunk to the base.

	Change make_env() so the caller can specify if the top volume is a leaf.
	In this case the base volume will become the leaf after the merge.

	Change the tests to create the Volume and Expected named tuples using
	key=value style. This make the test setup much easier to follow.

	blockVolume: Extract optimal_cow_size()
	Before merging into base volume, we need to compute the optimal size of
	the volume by measuring the volume, and adding free space. The
	calculation is done today in BlockVolume.optimal_size(), but this uses
	the actual size of the qcow2 image.

	Extract BlockVolume.optimal_cow_size() accepting the required size, so
	we can use the same calculation for the base volume before merge.

	blockVolume: Remove minimum internal volume size
	When computing the size of internal volume, we rounded up to 1 GiB. I'm
	not sure why we did this, since this does not help anything. The only
	limit we have is lvm extent size. Adding extra limits only complicate
	the code and make it harder to understand the system behavior.

	blockVolume: Remove MIN_PADDING
	When we added optimal_size() we added minimum 1 MiB padding for internal
	volumes. I'm not sure why we added this, but it seems useless and it
	complicates the code for no benefit.

	blockVolume: Always align optimal size to extent size
	Previously sometimes optimal size was aligned to extent size, and
	sometimes it was not. Using optimal size to extend or reduce with
	unaligned size is fine, since LVM rounds the size up, but this prevents
	calls from comparing optimal size (sometimes aligned) and actuals size
	(always aligned).

	blockVolume: Clean up optimal_size()
	Clean up the code to make it easier to follow and change:

	- Use config.getint() instead of int(config.get())
	- Eliminate unneeded clumsy temporary variables (potential_optimal_size)
	- Improve variables names to explain their purpose (free_space)
	- Improve comments to explain why the code is doing what it does
	- Remove the implementation details from the docstring, so we don't have
	  to modify the docstring when we modify the implementation.

	tests: Fix optimal size tests for internal volume
	When testing internal volume, the tests used the actual allocation as
	the virtual size, so all 3 tests tested the same case when the optimal
	size is clipped down to the maximum size.

	Fix by specifying also the virtual size.

	tests: Improve optimal size tests for leaf volume
	The first test checked the case when optimal size is actual_size +
	chunk_size. But because the test use 1 GiB virtual size, the result may
	be clipped by the maximum volume size. Change the test to use 2 GiB.

	The second test test the case when size is limited by maximum size, by
	writing 200 MiB to the volume. The test was correct, but since it
	writes lot of data it was defined as slow tests, so it does not run in
	the CI. Change the test to use 512 MiB volume, so we don't need to write
	anything to test that the size is limited by the maximum size, and we
	can run the tests everywhere.

2022-05-16  Albert Esteve  <aesteve@redhat.com>

	storage.compat: remove module
	This module used to handle missing ioprocess
	and sanlock libraries in Python 3, and a workaround
	to support sanlock older versions, that used 'async='
	instead of to 'wait='.

	However, it is not relevant anymore.

	Fixes: #56

2022-05-13  Tomáš Golembiovský  <tgolembi@redhat.com>

	qemuguestagent: store CPU count reported by guest
	oVirt Guest Agent has been reporting CPU count from guest to oVirt, but
	with the deprecation of the agent this information was lost. QEMU Guest
	Agent has command for reporting detailed statistics about guest CPUs
	that we can use to restore the functionality.

	Bug-Url: https://bugzilla.redhat.com/2077008

2022-05-12  rchikatw  <55136127+rchikatw@users.noreply.github.com>

	gluster: fix compatibility with Gluster v10 or greater
	In gluster v10 or greater stripe count is removed and when
	stripe count is not returned from the volume info command
	setting the stripeCount to default value that is 1

	Bug-Url: https://bugzilla.redhat.com/2078569

2022-05-11  Nir Soffer  <nsoffer@redhat.com>

	thinp: Extend drive immediately on events
	Expose periodic.dispatch() function allowing immediate dispatching of
	calls on the periodic executor. This is useful when you want to handle
	libvirt events on the periodic executor.

	The first user of this facility is the thinp volume monitor. Now when we
	receive a block threshold or enospc events we use the periodic dispatch
	to extend the relevant drive immediately. This eliminates the 0-2
	seconds wait after receiving an event.

	Here are test results from 4 runs, each writing 50 GiB to think disk at
	~1300 MiB/s. Each run extends the disk 20 times. The VM was not paused
	during the test.

	| time        |  min  |  avg  |  max  |
	|-------------|-------|-------|-------|
	| total       |  0.77 |  1.15 |  1.39 |
	| extend      |  0.55 |  0.92 |  1.14 |
	| refresh     |  0.16 |  0.22 |  0.31 |
	| wait        |  0.01 |  0.01 |  0.03 |

	Fixes: #85

	thinp: Support immediate extend on events
	If the monitor is configured with an executor dispatch function, it
	dispatches a call to extend the drive when getting a block threshold or
	enospc events. The drive will be extended on the first available
	executor worker.

	Using a dispatch function keeps the thinp module decoupled from the
	periodic module, makes testing easy, and prepares for removing the
	periodic monitor and use thinp executor.

	The periodic monitor checks if VM is ready for commands before
	monitoring. Testing extending of 4 drives at the same time shows that
	this check returns False randomly for no reason. Since this cause a
	delay before extending the drive, we don't do this check when receiving
	an event.

	We also don't check migration status or use it when handling libvirt
	calls failures, since it should be needed for volume monitoring,
	disabled on migration destination VM.

	When extending a drive we hold the drive monitor lock, to avoid
	conflicts with the periodic monitor and extend completion threads.

	thinp: Minimize duplicate extend requests
	To allow immediate extend when receiving a block threshold or enospc
	events, we need to handle the case when the monitor wakes up soon after
	we extended the drive, find that the drive is exceeded, and try to
	extend it again. This creates duplicate extend requests that should be
	harmless, but they make it hard to analyze extend stats.

	To avoid this issue we keep the time of the last extend in the drive.
	When the monitor handles an exceeded drive, it skips this drive if a
	previous extend was started recently, and the extend timeout did not
	expired.

	We need to handle also another case, when a drive was extended recently,
	but setting the block threshold failed, and it should be extended again.
	If the guest write quickly before the extend timeout expired, we want to
	extend the drive immediately.

	To allow immediate extend, add an "urgent" argument to
	_handle_exceeded(). It is set when we find that the drive needs to be
	extended in _handle_unset(). It will be also used when receiving a block
	threshold or enospc events.

	New `thinp:extend_timeout` option added to control the timeout.

	thinp: Fix race after a volume is refreshed
	When a volume was refreshed, it was still marked as exceeded. If the
	periodic montior wakes up at this point, it can try to extend the drive
	again. This is unlikely event, but if it happens there is no way to fix
	it without shutting down the VM.

	Avoid the race by locking the drive monitor lock before the refresh, and
	releasing the lock after we set a new block threshold. If the monitor
	tries to access the drive at this point, it will wait until the
	completion callback is completed or skip the check on timeout.

	If we cannot acquire the drive monitor lock in 60.0 seconds to we abort
	the refresh to avoid blocking the mailbox thread pool. This may happen
	if the drive monitor lock is held by another monitor thread block on
	inaccessible storage.

	A new `thinp:refresh_timeout` option was added to control the timeout.

	thinp: Lock drive during monitoring
	We want to extend drives as soon as possible when receiving events from
	libvirt. However we may receive multiple events at the same time, or the
	periodic monitor may wake up at the same time we want to handle an
	event. Running the monitor concurrently can lead to double extend
	messages and confusing logs.

	Handle multiple calls by locking the drive during monitoring.  If
	another thread try to monitor a drive while the drive monitor lock is
	held, it waits up 0.5 seconds for the lock and skip the check on
	timeout.

	A new `thinp:monitor_timeout` option was added to control the timeout.

	thinp: Fix handling of drive extended to maximum size
	The recent change to log a warning when exceeded drive cannot be
	extended since it was already extended to the maximum size revealed that
	we try to monitor and extend such drive every 2 seconds.

	Example failure flow with 20g drive:

	1. Vdsm extends drive to 22g, and set threshold to 20g
	2. Guest writes to offset 20g
	3. Vdsm get block threshold event and mark drive as exceeded
	4. Every 2 seconds, vdsm try to extend the drive and log a warning:

	    WARN  (periodic/0) [virt.vm] (vmId='d6eb4739-ccd3-4652-b6fc-7f4fd4a972ad')
	    Drive already extended to maximum size ...

	Why we extend the drive to 22g if drive capacity is 20g?
	When using qcow2 format, writing 20g of guest data requires additional
	space for qcow2 metadata. Vdsm allows up to 1.1 * capacity (22g).

	This is not a new issue - it existed since we started to use block
	threshold few years ago. But since we did not log anything, the bug was
	hidden, consuming CPU cycles and increasing the chance for filling up
	the executor queue and delaying other tasks.

	To fix this issue, introduce a new threshold state: `DISABLED`. In this
	state the drive is not picked up for monitoring. If the drive is
	resized, we change the threshold state back to `UNSET` so the drive will
	be monitored again in the next monitor cycle.

	tests: Update tests to use new configuration
	The tests were using the old configuration (1 GiB chunk size, 50%
	utilization). This is correct, but make it hard to simulate some cases.
	Update to current configuration (2.5 GiB chunk size, 20% utilization)

	thinp: Extract handlers for UNSET and EXCEEDED drives
	When monitoring drives, we have 2 expected threshold states:

	- UNSET: If the drive have enough free space, set a block threshold so
	  we will get an event when the drive should be extended. If the drive
	  does not have enough free space, mark it as EXCEEDED and start an
	  extend flow.

	- EXCEEDED: Drive was marked as EXCEEDED after receiving a block
	  threshold or enospc events, or we found that the drive does not have
	  enough free space. We start an extend flow.

	Extract _handle_unset() and _handle_exceeded(), each handling one
	threshold state. _should_extend_volume() was split to 2 simpler helpers,
	_can_extend_drive() and _drive_needs_extend().

2022-05-11  Ales Musil  <amusil@redhat.com>

	net: Fix the tuple assign
	The tuple returned from split_switch_type
	was not properly assigned to variables.
	This caused the if statement to be always true,
	it did not cause any issues, but the network capabilities
	were also getting OvS info which wasn't necessary.

	Assign it properly so only the first dictionary is used.

2022-05-11  Albert Esteve  <aesteve@redhat.com>

	lvm_test: improve vg test coverage
	Add a couple tests to improve coverage for VG handling
	in the lvm module.

	lvm: fix typo Signed-off-by: Albert Esteve <aesteve@redhat.com>

2022-05-10  Nir Soffer  <nsoffer@redhat.com>

	iscsi: Keep existing session on "session exists"
	If logging in to a node fails with "session exists" (error 15):

	    iscsiadm: initiator reported error (15 - session exists)

	We use to remove the node, which disconnects the node and remove it.
	This is not new behavior, but it seems that in 4.4 this cleanup was not
	effective in the case of logging in to the same connection more than
	once, and now it reliably disconnect the first node and leave the host
	without any nodes, which makes it non operational.

	Change iscsiadm to raise new IscsiSessionExists error, and keep the
	existing session when handling this error.

	With this change, if you try to connect to the same target more than
	once, the host should end with one connected target.

	Bug-Url: https://bugzilla.redhat.com/2083271

2022-05-09  Albert Esteve  <aesteve@redhat.com>

	lvm: remove six library
	We only support Python 3, so we can drop six library completely.

	lvm_test: add note for non-active host req test
	'test_lv_deactivate_in_use' fails if host is active
	in the engine when running the test. Host should be
	in maintenance mode before testing. This should be
	notted in the test.

	lvm_test: str format to f-string Signed-off-by: Albert Esteve <aesteve@redhat.com>

	lvm_test: update tests to run on vdsm hosts
	Some commands run and try to change configuration for the
	lvm command with the --config option, but this is wrong
	for hosts configured for vdsm, since they use devicesfile
	by default. Tests should be adapted and use --devices.

	This happens mostly in fixtures simulating stale devices,
	but other tests are also affected. Fix all occurrences and
	make them use `--devices` instead of `--config`.

	Fixes: #163

2022-05-09  Ales Musil  <amusil@redhat.com>

	net: Add python3-libnmstate as explicit dependency
	With changes to nmstate-2.1 the python3-libnmstate
	is not installed by default. This causes issues for
	the unit tests as they need the library to work properly.

	Add the python3-libnmstate as explicit dependency into
	spec file as well, to be clear that this is required.

2022-05-04  Nir Soffer  <nsoffer@redhat.com>

	thinp: Validate allocation only when we use it
	We use the allocation reported by libvirt only in when checking drives
	before getting a block threshold event, so we better check the
	allocation only when we we need it. Getting improbable allocation in
	other cases does not affect anything.

	thinp: Improve handling of improbable allocation
	If we detect an improbable allocation value, we paused the VM and raise.
	Improve this implementation to make it easier to work with.

	- Rename the exception to describe the issue better
	- Remove the error log since we raise. The top level error handler will
	  log a traceback with this error.
	- Don't handle this error in monitor_volumes(), so the error will
	  propagate to the top level error handler.
	- Use fstring to format the exception instead of fragile %.
	- Simplify the exception message, all the important details are already
	  included in block_info named tuple.
	- Add the missing test to make sure this behavior is not modified by
	  mistake.

	tests: Remove unneeded fake implementation
	FakeVM block_stats and query_block_stats are not used now. They were
	left by mistake when tests using them were removed.

	thinp: Unify log when extending
	Because we always mark a drive for extension, there is no point to log
	the threshold state. Unify the text "Requesting an extension" to match
	similar logs about extending the volume and the replica.

	thinp: Inline docstring comments
	Move comments from the docstring into the function to keep the comments
	closer to the code they explain.

	storage: Report changes in threshold state on events
	When handling block threshold or enospc events, return True if the event
	change the drive threshold state to EXCEEDED. This can be used by
	callers to detect when a drive should be extended immediately.

	thinp: Fix extending of exceeded volume
	When libvirt report that the threshold was exceeded:

	    2022-04-09 23:30:01,179+0300 INFO  (libvirt/events) [virt.vm]
	    (vmId='724d41da-0b01-4d58-8a73-113906da1565') Block threshold
	    3221225472 exceeded by 196608 for drive 'sdb[1]'

	querying block status may show that allocation is smaller than the
	threshold:

	    2022-04-09 23:30:01,186+0300 DEBUG (periodic/1) [virt.vm]
	    (vmId='724d41da-0b01-4d58-8a73-113906da1565') Extension info for
	    drive sdb volume e1b76177-d925-497d-93ff-05ccb980f32a:
	    BlockInfo(index=1, name='sdb', allocation=3213557760,
	    capacity=53687091200, physical=5368709120, threshold=0)

	This happens when trying to extend a disk immediately after receiving a
	block threshold event.

	Change the check to always extend drives when threshold state is
	EXCEEDED without checking the amount of free space.

	Since we always extend, we don't need the workaround for libvirt bug
	when allocation is not reported.

	thinp: Fix forcing block threshold to exceeded
	When finding that a drive need extension we force drive threshold state
	to exceeded to make sure we continue to monitor this drive until it is
	extended. This is good, but the way this was implemented is wrong,
	clearing the exceeded time. Use Drive.on_block_threshold() which does
	the right thing now.

	thinp: Fix handling of ENOSPC
	Previously we called monitor_volumes(), but this does nothing unless a
	drive block threshold is UNSET or EXCEEDED. If the guest is writing
	fast, it may exceed the block threshold before we set it, and this case
	we will never get a block threshold event from libvirt.

	This can lead to paused VM with full disk that will never be extended.
	Since the disk is full, resuming the VM fails immediately with ENOSPC.
	But since block threshold was already exceeded, qemu does not submit a
	block threshold event. The only way to recover is to shutdown the VM.

	Fixed by unifying the way we handle ENOSPC and block threshold event; We
	mark a drive block threshold as EXCEEDED in both cases. The periodic
	monitor will pick up the drive in the next monitoring cycle.

	Since on_enospc() cannot handle the case of ENOSPC without a drive, we
	call it only when know the drive. Fixing monitor_volumes() to handle
	this case is possible but require a big change, and I don't think this
	case actually happens.

	This change have nice side effects:

	- Avoid the issue of blocking the libvirt event loop when calling
	  monitor_volumes() and storage APIs block.

	- Avoid noise in the logs when libvirt issue multiple ENOSPC events for
	  the same drive. We log changes in drive block threshold only once.

	This change introduces up to 2 seconds delay when resuming a paused VM.
	However, since the VM already paused, the delay is not critical. This
	delay will be eliminated when we eliminate the delay when receiving
	block threshold event.

	storage: Fix handling of block threshold event
	Previously calling the event handler twice would reset the exceeded
	time and log bogus message. We should reset the exceeded time and log
	only when setting the block threshold to exceeded on the first time.

	thinp: Update stale comment
	We always use block threshold events. When the flag to disable block
	threshold event was removed, the comment was not updated.

	vm: Remove monitor_volumes()
	VM.monitor_volumes() was called only from the periodic job, which
	already access vm.volume_monitor for checking if monitoring is needed.
	Remove the unneeded method and change the periodic job and the tests to
	access vm.volume_monitor.monitor_volumes().

	thinp: Add on_enospc() handler
	When a VM is paused because of ENOSPC, call VolumeMonitor.on_enospc()
	instead of VolumeMonitor.monitor_volume(). This will make it easier
	to unify the way we handle block threshold events and I/O errors.

	vm: Lookup drive earlier in onIOError
	When a vm pauses because of I/O error, we need to mark the drive for
	extension. Look up the drive early and pass it to the helper for sending
	event, instead of looking up the drive in the helper.

	thinp: Simplify monitor_volumes()
	Previously we returned True if some drives were extended, and ENOSPC
	handler logged a message if some drives were extended. This log is not
	very useful.

	Simplify the interface by not returning anything. This will allow
	handling the monitoring request in the executor thread pool, and it
	simplifies the monitoring code.

	thinp: Separate block info update from extend check
	On each monitor cycle, we need to get block stats from libvirt and
	updated the block info for every drive that needs monitoring. Based on
	the block info, we decide if the drive should be extended.

	Separate the block info update from the extend check. This makes the
	code easier to change and understand.

2022-05-02  Albert Esteve  <aesteve@redhat.com>

	Use requires_root decorator instead of checkSudo
	Current 'checkSudo' implementation is potentially buggy. The option
	'-l' serves no clear purpose and might even wrong in recent sudo version,
	causing a false positive and a later fail due to a required sudo.

	Furthermore, some commands might fail unrelated to sudo requirement,
	resulting in wrongfully skipped tests.

	- Use 'requires_root' decorator instead of checkSudo, which safely
	  skips the test if user is not root
	- Eventually will be necessary to fix or remove checkSudo.
	  Tracked at #152

	Fixes: #143

	Remove copyrighted text
	There is some copyrighted text in the file were just dome data
	was required. Use (byte-)strings with informative text instead.

	Add ret code checks for ExecCmd Tests
	Tests testing 'execCmd' outputs were not checking the return code,
	which could lead to a false positive in the case of the 'testStdErr' or
	'testSudo'.

	Add assertion to check return code and improve coverage.

	Fixes: #143

2022-04-29  Albert Esteve  <aesteve@redhat.com>

	Create and document a 'clean-storage' make target
	Add a new make target named 'clean-storage', to make it consistent
	with other projects (e.g., ovirt-imageio), and as a counterpart to
	the already existing 'make storage'. This way both creation and
	cleanage are handled in the makefile itself.

	Document the change in the 'test/storage/README.md'.

	Improve readmes and development doc
	- Add copr virt-preview enablement to the developer's readme
	- Unify running-the-tests section and extend it with mentions to
	  'make storage' and 'make tests'
	- Update storage README "setting up" section to use make storage instead
	  of direct script invocation

	Create development doc file
	Create a developer's readme in the doc folder and move all
	developer-specific sections from main README.md and test/README.md to
	doc/development.md, and linking the new document from main README

	Update CI section in README
	Remove outdated parts from the original README:
	- Remove Travis CI section since it is not used anymore.
	- Update to the current GitHub CI flow and add the path to
	  the configuration file.

	Transform tests readme to Markdown
	Transform the tests README file to README.md and change format to fully
	comply with Markdown syntax.

2022-04-28  Albert Esteve  <aesteve@redhat.com>

	External API tests to use userstorage
	Make outofprocess_test to use user storage instead of temporary
	files.

	Temporary files caused 'test_write_file_direct_true_unaligned'
	to fail in file systems without minimum block size for direct I/O
	(e.g., btrfs).

	User storage removes the issue as it creates a folder structure with
	fixed sector size.

	The patch includes:
	- New fixture to handle user storage mount point
	- Update all tests in the 'External API' part to use use the new fixture
	- Parametrize fixture to test multiple sector sizes in order to increase
	  coverage

	Fixes: #144

	Consolidate semantically equal endpoint params
	In 'copy_data', Endpoint attributes 'is_destination'
	and 'writable' are semantically the same as both are
	true only for the destination.

	However, this behaviour could lead to a potential
	programming bug. The attributes are not checked in the
	body of the constructor, and a misusage could create a
	non-writable destination endpoint, which is not correct.

	To avoid this, remove 'writable' as a parameter of the
	constructor and as a class member, and keep only 'is_destination',
	deducted from the 'dest' parameter. This way we enforce
	the correct behavior.

	Fixes: #107

2022-04-21  Marcin Sobczyk  <msobczyk@redhat.com>

	Update OST trigger
	The removed logic has been implemented on OST side.

2022-04-19  Nir Soffer  <nsoffer@redhat.com>

	target: Support targets with multiple portals
	When creating a target, the default portal (0.0.0.0:3260) is created. To
	multipath properly we need to create multiple portals, using multiple ip
	addresses or ports.

	Add --portal argument, allowing to create a portal listening to specific
	address:port, or multiple portals.

	Here is an example command:

	    # ./target create 05 \
	        --lun-count 1 \
	        --cache \
	        --portal 192.168.122.34 \
	        --portal 192.168.122.35

	    Creating target
	      target_name:   05
	      target_iqn:    iqn.2003-01.org.alpine.05
	      target_dir:    /target/05
	      lun_count:     1
	      lun_size:      100 GiB
	      cache:         True
	      exists:        False
	      portals:       192.168.122.34:3260, 192.168.122.35:3260

	    Create target? [N/y]: y
	    ...

	This crates a target with 2 portals:

	    # targetcli /iscsi ls iqn.2003-01.org.alpine.05
	    o- iqn.2003-01.org.alpine.05 ......................................... [TPGs: 1]
	      o- tpg1 .................................................. [gen-acls, no-auth]
	        o- acls .......................................................... [ACLs: 0]
	        o- luns .......................................................... [LUNs: 1]
	        | o- lun0 ................ [fileio/05-00 (/target/05/00) (default_tg_pt_gp)]
	        o- portals .................................................... [Portals: 2]
	          o- 192.168.122.34:3260 .............................................. [OK]
	          o- 192.168.122.35:3260 .............................................. [OK]

	If you have older targets listening to the default portal, you need to
	remove the default portal from these targets and add a portal listening
	to specific address:

	    # target /iscsi/iqn.2003-01.org.alpine.01/tpg1/portals delete 0.0.0.0 3260
	    # target /iscsi/iqn.2003-01.org.alpine.01/tpg1/portals create 192.168.122.34

	After creating the first non-default portal, you cannot create new
	targets with the default portal, so practically you always need to
	specify --portal. But existing setups using the default portal can
	continue to use this tool without specifying --portal.

	logger.conf: Add separate mailbox logger
	The storage mailbox debug logs are extremely noisy since we add mailbox
	events. Add a separate logger for the `storage.mailbox` so enabling
	DEBUG log for `storage` logger does not change the storage mailbox
	logs.

	We want to keep it possible to use DEBUG level log for the mailbox, but
	this must be done explicitly.

	To enable storage mailbox DEBUG logs permanently, you can modify
	/etc/vdsm/logger.conf:

	    [logger_mailbox]
	    level=DEBUG
	    handlers=logthread
	    qualname=storage.mailbox
	    propagate=0

	and restart vdsmd service.

	To enable storage mailbox logs temporarily, use:

	    # vdsm-client Host setLogLevel level=DEBUG name=storage.mailbox

	Run again with level=INFO to silence these logs.

	Fixes: #135

	mailbox: Silence common logs
	The is no need to log the command sending mail to the SPM in INFO level,
	and there is no need to log the command parameters since they are logged
	by the mailbox command runner anyway.

	The mailbox event logs are nice for debugging, but not needed in INFO
	level.

	mailbox: Run commands with storage.mailbox logger
	Using same logger makes it easy to control the noisy mailbox log if you
	need to enable debug logs.

	mailbox: Run all commands with same runner
	Most commands are using _mboxExcecCmd, which is implemented using
	misc.exceCmd, but 2 commands were run using misc.execCmd directly.
	Unify all commands to use the same runner.

	mailbox: Use same logger
	The mailbox classes use logger per class. This is unneeded since we log
	the thread names, and we move most code to module logger. To keep this
	change minimal and easy to backport, I only change the logger names so
	the class loggers use the same logger name "storage.mailbox".

	We will switch to module logger later for oVirt 4.5.1.

2022-04-18  Arik Hadas  <ahadas@redhat.com>

	rephrase comment in Volume#create
	Fix a typo:
	smaller then -> smaller than
	And minor rephrasing:
	qemu tries to access -> qemu attempts to access

2022-04-14  Milan Zamazal  <mzamazal@redhat.com>

	New development cycle: 4.5.1

2022-04-13  Albert Esteve  <aesteve@redhat.com>

	Update tests README. Move tox recreate command to troubleshooting.

2022-04-13  Sandro Bonazzola  <sbonazzo@redhat.com>

	spec: exclude ppc64le on el9
	ppc64le is not shipping qemu-kvm anymore.

2022-04-12  Albert Esteve  <aesteve@redhat.com>

	Update tests README Signed-off-by: Albert Esteve <aesteve@redhat.com>

2022-04-07  Albert Esteve  <aesteve@redhat.com>

	Update README.md

2022-04-07  Germano Veit Michel  <germano@redhat.com>

	mailbox: fix typo on event receiving log
	There is a typo here: "Recived"
	
	2022-04-07 09:38:48,661+1000 INFO  (mailbox-spm) [storage.MailBox.SpmMailMonitor] Recived event: dea59300-f65a-485c-90f6-227d9c2823cc (mailbox:857)

2022-04-05  Nir Soffer  <nsoffer@redhat.com>

	fileVolume: Fix creation of tiny volumes
	When creating preallocated volume with size <= 4096, we try to allocate
	0 bytes:

	    /usr/libexec/vdsm/fallocate --offset 4096 0 /path...

	Which fail in fallocate() with EINVAL as expected, and then we raise

	    VolumesZeroingError: Cannot zero out volume: /path...

	It looks like engine does not handle this error correctly, and the disk
	remain locked forever.

	Creating a volume of 4k bytes or less is not a real use case, but there
	is no technical reason why it should fail. Fixed by skipping the
	allocation if we don't have anything to allocate.

	Fixes: 84ef2dd05fe1 (fileSD: Do not use qemu-img preallocation=falloc)
	Reported-by: Janos Bonic <jpasztor@redhat.com>

2022-04-04  Nir Soffer  <nsoffer@redhat.com>

	tests: Refine thinp stress test
	Make the thinp test script easier to use for testing, and nicer as
	example for RHV demo.

	- Add command line arguments so there is no need to edit the script.
	- Fix short write - previously short write would be ignored, reporting
	  incorrect actual rate and total size.
	- Fix actual rate - previously did not consider throttling.
	- Add correction time for throttling, considering the time to sleep and
	  get the next timestamp in the next sleep.
	- Round the actual rate for more stable results.

	Example usage with this change:

	    $ python tests/storage/stress/thinp.py --rate 1024 --size 10 thinp.data
	    10.00 GiB, 10.00 s, 1024.1 MiB/s

2022-04-03  Nir Soffer  <nsoffer@redhat.com>

	tests: Test mailbox roundtrip with event disabled
	Reuse the roundtrip() helper to run one test with mailbox events
	disabled.

	If we compare the round trip tests with and without events, we can see
	that using events we get the expected improvement.

	$ tox -e storage tests/storage/mailbox_test.py -- \
	    -k 'test_roundtrip_events_enabled[8-0.05]' \
	    --log-cli-level=info \
	    | grep 'stats'
	18:03:19,873 INFO    (MainThread) [test] stats: messages=8 delay=0.050
	best=0.169 average=0.263 worst=0.325 (mailbox_test:339)

	$ tox -e storage tests/storage/mailbox_test.py -- \
	    -k 'test_roundtrip_events_disabled' \
	    --log-cli-level=info \
	    | grep 'stats'
	18:04:51,923 INFO    (MainThread) [test] stats: messages=8 delay=0.050
	best=0.840 average=1.036 worst=1.214 (mailbox_test:364)

	tests: Separate the roundtrip flow from the test
	Extract a roundtrip() helper so we can test the same flow also with
	events disabled.

	tests: Simplify roundtrip parameters
	We don't tests failures, so we don't care about the wait after 9
	failures, and can make the parameters more compact.

	tests: Verify roundtrip stats
	Verify that best, average and worst roundtrip times are expected. This
	should protect from performance regressions in the mailbox.

	Testing on github shows that the tests can be 2 times slower compared
	with local run. We use larger timeouts to avoid random failures.

	The docstring explains how to run the roundript tests and inspect the
	results manually.

	tests: Refine mailbox rounddtrip timeouts
	The roundtrip tests is fast enough now regardless of the number of
	messages, and our CI is pretty good, so we don't need extreme timeouts.

	Use time.monotonic() instead of time.time(), this is more reliable in
	case system time changes during a run.

	If a timeout expires, raise RuntimeError instead of assert. This is not
	a test failure but an error.

	Use EVENT_INTERVAL for sleeping, since when using smaller interval we
	can detect cleared mailbox faster.

2022-04-03  Pavel Bar  <pbar@redhat.com>

	storage: fix typos
	From reading the code I noticed a few typos in the comments.
	1. Fixed 2 typos.
	2. "needs_monitoring()" method's comment clarification.

2022-04-01  Nir Soffer  <nsoffer@redhat.com>

	api: Add missing added info
	Recent arguments added to the schema were missing the "added" property.

	Fixes: 1e964f537971 (qemuimg: Support measuring sub chain)
	Fixes: f592c261fa39 (vm: Support pre-extended volumes for replication)
	Thanks: Pavel Bar <pbar@redhat.com>

2022-03-31  Nir Soffer  <nsoffer@redhat.com>

	mailbox: Configurable event interval
	Add "mailbox:events_interval" configuration, allowing using longer or
	shorter events interval without changing the code. This is useful for
	testing the best value in scale tests or to tuning in production.

	To change the configuration, users can add a file on all hosts like:

	    $ cat /etc/vdsm/vdsm.conf.d/99-local.conf
	    [mailbox]
	    events_interval = 0.25

	And restart the vdsmd service.

2022-03-29  Harel Braha  <hbraha@redhat.com>

	New release: 4.50.0.11

2022-03-28  Nir Soffer  <nsoffer@redhat.com>

	mailbox: Enable mailbox events by default
	Enable mailbox:events_enable by default for minimizing mailbox latency
	and improving virtual machines robustness with thin disks.

	mailbox: Minimize messages latency
	Minimize message latency by introducing mailbox events mechanism. The
	unused host 0 mailbox is used now for sending and receiving mailbox
	events.

	To control mailbox events, a new "mailbox:events_enable" was added. The
	option is disabled by default, so we can test this change before we
	enable it by default, or disable it in production if needed.

	To enable mailbox events, add a drop-in configuration file to all hosts:

	    $ cat /etc/vdsm/vdsm.conf.d/99-local.conf
	    [mailbox]
	    events_enable = true

	And restart the vdsmd service.

	When mailbox:events_enable option is enabled:

	- Hosts write an event to host 0 mailbox after sending mail to the SPM.

	- The SPM monitors host 0 mailbox every eventInterval (0.5 seconds)
	  between monitor intervals, so it can handle new messages quickly.

	- When hosts wait for reply from the SPM, they monitor their inbox every
	  eventInterval (0.5 seconds), so they detect the reply quickly.

	- Host reports a new "mailbox_events" capability. This can be used by
	  engine to optimize mailbox I/O when all hosts in a data center
	  supports this capability.

	With this change, extend roundtrip latency reduced from 2.0-4.0 seconds
	to 0.5-1.0 seconds, reducing the risk of pausing VM when writing quickly
	to fast storage.

	Testing shows that we can write now 525 MiB/s (the maximum rate on my
	nested test environment) in the guest without pausing a VM when the disk
	is extended. Before this change, we could write only 350 MiB/s before
	the VM starts to pause randomly during the test.

	Here are example logs form this run, showing that total extend time is
	1.14-3.31 seconds instead of 2.5-8.3 seconds before this change.

	    <Clock(total=1.65, wait=0.28, extend-volume=1.09, refresh-volume=0.28)>
	    <Clock(total=2.67, wait=1.80, extend-volume=0.58, refresh-volume=0.29)>
	    <Clock(total=3.10, wait=1.74, extend-volume=1.10, refresh-volume=0.25)>
	    <Clock(total=2.85, wait=1.55, extend-volume=1.08, refresh-volume=0.22)>
	    <Clock(total=2.02, wait=1.14, extend-volume=0.58, refresh-volume=0.30)>
	    <Clock(total=1.14, wait=0.33, extend-volume=0.56, refresh-volume=0.25)>
	    <Clock(total=2.83, wait=1.42, extend-volume=1.09, refresh-volume=0.32)>
	    <Clock(total=1.68, wait=0.33, extend-volume=1.10, refresh-volume=0.25)>
	    <Clock(total=2.45, wait=1.47, extend-volume=0.60, refresh-volume=0.38)>
	    <Clock(total=1.44, wait=0.11, extend-volume=1.09, refresh-volume=0.24)>
	    <Clock(total=2.46, wait=1.04, extend-volume=1.13, refresh-volume=0.30)>
	    <Clock(total=1.55, wait=0.17, extend-volume=1.07, refresh-volume=0.31)>
	    <Clock(total=2.02, wait=1.13, extend-volume=0.60, refresh-volume=0.28)>
	    <Clock(total=1.75, wait=0.39, extend-volume=1.12, refresh-volume=0.24)>
	    <Clock(total=2.98, wait=1.61, extend-volume=1.12, refresh-volume=0.25)>
	    <Clock(total=1.28, wait=0.41, extend-volume=0.61, refresh-volume=0.25)>
	    <Clock(total=2.46, wait=1.48, extend-volume=0.62, refresh-volume=0.36)>
	    <Clock(total=3.31, wait=1.97, extend-volume=1.12, refresh-volume=0.21)>
	    <Clock(total=1.81, wait=0.96, extend-volume=0.58, refresh-volume=0.27)>
	    <Clock(total=2.74, wait=1.87, extend-volume=0.58, refresh-volume=0.29)>

	I tested also a shorter eventInterval (0.2 seconds). This reduces the
	extend time by 20%, but doubles the cpu usage on the SPM mailbox.

	With this change, the next improvement is eliminating the wait time. In
	the worst case, we waited 1.97 seconds before sending an extend request,
	which is 68% of the total extend time (3.31 seconds). This issue is
	tracked in #85.

	Fixes #102.

	tests: Add thinp extension stress test
	The new test script should run in a guest to trigger extension with
	configurable write rate, testing the maximum write rate before a VM
	starts to pause during extension.

	Example usage - running in a Fedora 35 VM:

	    # lsblk
	    NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
	    sda      8:0    0    6G  0 disk
	    ├─sda1   8:1    0    1M  0 part
	    ├─sda2   8:2    0    1G  0 part /boot
	    └─sda3   8:3    0    5G  0 part /
	    sdb      8:16   0   50G  0 disk
	    sr0     11:0    1 1024M  0 rom
	    zram0  252:0    0  891M  0 disk [SWAP]

	    # python3 thinp.py
	    50.00 GiB, 97.53 s, 524.98 MiB/s

	contrib: Collect extend stats
	Add extend-stats tool collecting thin volume extend stats from vdsm log.

	Example usage:

	    $ contrib/extend-stats </var/log/vdsm/vdsm.log

	    Total time
	    min=1.050 avg=2.162 max=3.470

	    Wait time
	    min=0.090 avg=1.011 max=1.980

	    Extend time
	    min=0.560 avg=0.864 max=1.150

	    Refresh time
	    min=0.190 avg=0.285 max=0.600

2022-03-24  Mark Kemel  <mkemel@redhat.com>

	Adding tests for hotplug with SCSI reservations
	These tests cover parsing VM XML for Direct LUN hotplug with SCSI
	Reservations, Drive creation and serializing to XML for libvirt

2022-03-22  Benny Zlotnik  <bzlotnik@redhat.com>

	copy_data: only skip lock for div endpoint
	We need to set lock_image if not both endpoint are DIV, and we need to
	check img_id only if both are DIV, otherwise this operation will fail
	with a KeyError as external endpoints do not have the img_id property.

	This patch sets img_id to None for the external endpoint and checks
	within the div endpoint if we need to lock the source image.

	Bug-Url: https://bugzilla.redhat.com/2066285

2022-03-22  Mark Kemel  <mkemel@redhat.com>

	Add SCSI reservations support to hotplug API
	During Direct LUN disk hotplug with SCSI Reservation, Reservation is
	not enabled. The configuration received by vdsm from the engine
	contains the 'reservations' parameter, but the one transferred to
	libvirt - does not.

	Added support for the 'reservations' parameter support both to vm
	metadata deserialization and serialization.

	Bug-Url: https://bugzilla.redhat.com/2028481

2022-03-22  Nir Soffer  <nsoffer@redhat.com>

	qemuimg: Allow measuring active image
	Add unsafe argument for allowing measuring of an active image using the
	--force-share,-U option. Enable the unsafe option for Volume.measure()
	API to engine can measure active images.

	Vdsm reports new capability "measure_active" so allow engine to measure
	active images with older cluster versions.

	Bug-Url: https://bugzilla.redhat.com/2064907

	qemuio: Add helper for locking images
	Add a helper to open image in write mode, for testing access to active
	images. Testing the helper show that it works only for qcow2 format.
	This may be a bug or incorrect usage, so the raw test is mark as xfail
	for now.

	qemuimg: Support measuring sub chain
	Add new "base" argument, enabling measuring of a sub chain, without the
	rest of the backing chain:

	    parent <- [base <- top] <- child

	The typical use case is estimating the size of the base volume after
	merging the top volume into the base.

	This will allow extending the base volume to the right value before
	merge instead of over allocating, that cannot be fixed in live merge.

	The API will allow engine to measure volumes before merge instead of
	guessing in a bad way, assuming vdsm internal details like chunk size
	and qcow2 metadata overhead.

	Currently we support only chains with 2 elements, since we don't support
	merging of longer chains. If support for longer chains will be added in
	the future, we can remove the limit in the measure implementation.

	Vdsm reports new "measure_subchain" capability to allow engine to
	measure sub-chain with older cluster versions.

	Bug-Url: https://bugzilla.redhat.com/2064907

	tests: Tests backing chain with file
	Measure without a backing chain was tested only with block storage.
	Add tests for file storage and refactor the code creating the chain from
	the actual test.

	Due to the way we create block storage, we cannot use yet the same
	storage with multiple tests, but if we change the way storage is defined
	we can use a module or session scope fixtures.

	vm: Support pre-extended volumes for replication
	Disk replication during live storage migration (LSM) is racy; we start
	mirroring the active volume to the destination storage domain, and then
	send initial extension request. The initial extension request is likely
	to be too late, since the destination volume created by engine is too
	small (1g).

	Doing the initial extension on vdsm side is hard, requiring similar
	mechanism to live merge initial extension. But in this case we can use a
	much simpler approach; engine can create the volumes large enough so the
	initial extension request is never needed.

	Add a needExetend argument to VM.diskReplicateStart. The default value
	is true, so when new engine works with older vdsm, nothing changes. If
	engine created the volumes large enough it will specify false and vdsm
	will skip the initial extension.

	Engine can tell if a host supports this argument using the
	"replicate_extend" host capability.

	Bug-Url: https://bugzilla.redhat.com/1958032

2022-03-22  Ales Musil  <amusil@redhat.com>

	net, ovs: Fix empty OVN bridge mappings
	The openvswitch does not accept '""' and complains
	about it, change it to simple empty string instead.

2022-03-21  Tomáš Golembiovský  <tgolembi@redhat.com>

	virt: allow modifying manual CPU pinning during migration
	The original implementation was pre-creating XML elements early on the
	VM.migrate call. The rationale behind it being to validate the passed
	parameter and avoid as many errors as possible later in the migration
	thread. This however makes it useful only for CPU policies that do not
	have manual pinning (otherwise it would lead to duplicate elements in
	final XML).

	To make it possible to reconfigure also manual CPU pinning during
	migration the code was split. The parameter verification is still done
	early but the XML is constructed later in the source thread as needed.

	virt: allow modifying NUMA pinning during migration
	When migrating a NUMA VM to another host the pinning to physical NUMA
	may need to be updated. This may be because the corresponding pNUMA used
	on source is in use by other VMs on destination, or because the hardware
	does not exactly match between hosts.

2022-03-16  Ales Musil  <amusil@redhat.com>

	automation: Fail early in the container makefile
	The makefile did not fail early when container build failed
	which on CI buried the real failure in the logs.
	Fail right away so the error is easier to find.

	net: Add crypto policy workaround for el9s
	The copr packages were signed by SHA1 which was
	removed on el9s, until the relevant BZ [0] is fixed
	set crypto policy to lagacy.

	[0] https://bugzilla.redhat.com/2059101

2022-03-15  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.50.0.10

	gitignore: Add build-aux/config.* generated files

2022-03-15  Nir Soffer  <nsoffer@redhat.com>

	multipath: Fix generated multipath.conf
	When running multipath we see new warnings:

	    # multipath -ll
	    Mar 15 03:11:33 | ignoring extra data starting with 'devices,' on line 119 of /etc/multipath.conf
	    Mar 15 03:11:33 | /etc/multipath.conf line 119, invalid keyword: Blacklist
	    Mar 15 03:11:33 | ignoring extra data starting with 'be' on line 120 of /etc/multipath.conf
	    Mar 15 03:11:33 | /etc/multipath.conf line 120, invalid keyword: should

	Recent change in multipath replaced part of a comment with text; comment
	it properly.

	Fixes a08988ed30538b32f87b198120b27d98ba7e3de1

2022-03-14  Nir Soffer  <nsoffer@redhat.com>

	nbd: Create overlay only if needed
	If we export a volume with a parent, but the bitamp does not exist in
	the parent, we don't need to create an overlay. The overlay is needed
	only when the bitmap exists in the volume backing chain, since qemu-nbd
	export the bitmap only from the top volume.

	Now we create an overlay only if the bitmap chain has more than one
	item. If it contains only one item this must be the exported volume
	(enforced by _find_bitmap).

	nbd: Always verify bitmap
	Previously we checked if the bitmap exists only if the volume had a
	parent, when we create an overlay. If we export a single volume and the
	bitmap was missing, qemu-nbd starts normally, but exit immediately when
	it try to export the bitmap. Since we detected that qemu-nbd started,
	NBD.start_server() succeeds, and engine report the transfer as ready.
	When the user try to connect to imageio server, it fails since qemu-nbd
	is not running.

	Here is an example failure:

	    Mar 08 01:44:43 host4 systemd[1]: Started /usr/bin/qemu-nbd --socket
	    /run/vdsm/nbd/54653918-63b2-49c1-8e7f-3c2d47db2bfc.sock --persistent --shared=8
	    --export-name= --cache=none --aio=native --allocation-depth --read-only
	    --bitmap=859e94f0-aaab-4a3b-bd65-a7dc1b8d822b json:{"driver": "qcow2", "file":
	    {"driver": "host_device", "filename": "/rhev/...-4c9808da12e8"}}.

	    Mar 08 01:44:43 host4 qemu-nbd[27200]: qemu-nbd: Bitmap
	    '859e94f0-aaab-4a3b-bd65-a7dc1b8d822b' is not found

	Fix the issue by verifying that the bitmap exists in all cases, and
	failing with the same error we fail when a bitmap is missing or invalid
	in the backing chain. This will fail the transfer earlier, which will
	make it easier to debug the issue.

	Regardless of this fix, we need to understand why qemu-nbd starts
	normally when the bitmap is missing. This smells like a bug in qemu-nbd,
	creating a server socket before it validates the bitmap argument.

	nbd; Include volume in bitmap error
	When failing because bitmap does not exists, include the volume in the
	error message. This is more consistent with the way we fail in
	_find_bitmap() if a bitmap is missing in one of the volumes. This should
	make it easier to debug issues with missing bitmaps.

	tests: Add failing tests for bitmap validation
	When exporting bitmap from single volume we did not validate that the
	bitmap exists and valid. Add failing tests reproducing the error seen
	when testing with broken engine.

	tests: Rename bitmap validation tests
	All the tests handle only the case of exporting volume with the backing
	chain. We need add similar tests for single volume.

2022-03-14  Milan Zamazal  <mzamazal@redhat.com>

	supervdsm: virt: Fix mdev_delete docstring

	virt: Add support for vGPU driver parameters
	Engine is going to add a new metadata item for vGPU devices,
	mdevDriverParameters.  It specifies driver parameters to be set when
	the mediated device is created, by writing a file in /sys.

	Since mediated devices are created before each use and destroyed after
	each use, it's enough to write the driver parameters, if specified,
	only when the device is created.  It's not necessary to clear them
	when the device is removed, although it is possible to do it by
	writing a space to the driver parameters file in /sys.

	Currently, only NVidia driver parameters are supported.  If setting
	driver parameters is attempted for other cards or if setting the
	driver parameters fails for another reason, creation of the mediated
	device will fail and the VM won't be started.

	Bug-Url: https://bugzilla.redhat.com/1987121

	New release: 4.50.0.9

2022-03-09  Shani Leviim  <sleviim@redhat.com>

	Add a blacklist for rbd devices
	When using Ceph, multipath might prevent the rbd device from being
	unmapped and it stays mapped.
	It results with getting the following error while operating the disk:
	rbd.ImageBusy: [errno 16] RBD image is busy
	This patch adds rbd to the multipath blacklist.

	Bug-Url: https://bugzilla.redhat.com/1881832
	Bug-Url: https://bugzilla.redhat.com/1755801

2022-03-02  Harel Braha  <hbraha@redhat.com>

	New release: 4.50.0.8

2022-03-01  Nir Soffer  <nsoffer@redhat.com>

	tests: Test creating volumes with new bitmap
	Add tests for creating a volume or snapshot with a new bitmap with
	local and block storage domains.

	Turns out that lvm tests are broken now on Fedora 35, since it does not
	support the --devices option. The tests can run on Centos/RHEL host.

	volume: Support adding a bitmap to new volume
	When creating a new volume or a snapshot, caller can specify
	bitmap={bitmap-uuid} to create a new empty bitmap in the new volume.

	This will enable hybrid backup, when every backup is started by creating
	a temporary snapshot. The new bitmap records the changes to the volume
	since the snapshot was created. When the temporary snapshot will be
	deleted, the new bitmap will be copied to the parent volume, and will be
	used for the next incremental backup.

	Like add_bitmaps, we treat creating a new bitmap as best effort. If the
	operation fails we don't fail the creation of the volume, but the next
	backup using this bitmap will have to be a full backup.

	volume: Extract _silent_clone_bitmaps
	Extract the code for adding bitmaps from parent volume to its child
	when creating a snapshot to a helper. This will make it easier to have
	common code for manipulating bitmaps at the same place.

	tests: Fix illegal volume test
	Creating new block sd domain requires root.

	Fixes 0c55e217da78 (storage: support creating illegal volumes)

	tests: Skip glance tests
	The tests depends on external server. When the server is down, calls
	times out and fail the entire test run.

	To enable these tests we must have a quick way to check if glance is up.
	Until we have this, the tests will be skipped.

2022-03-01  Vojtech Juranek  <vjuranek@redhat.com>

	CODEOWNERS: Remove me from code owners
	Moving to different project, removing myself from code owners not to
	block code reviews.

2022-02-23  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.50.0.7

2022-02-23  Nir Soffer  <nsoffer@redhat.com>

	config: Increase thin provisioning threshold
	Collecting extend stats shows that extend takes between 2.2 to 6.2
	seconds, with average of 3.7 seconds. With the default thresholds:

	[irs]
	volume_utilization_chunk_mb = 1024
	volume_utilization_percent = 50

	This means that we extend the volume when free space is 512 MiB. Writing
	more than 512 MiB in 3.7 seconds (138.4 MiB/s) will cause the VM to
	pause with ENOSPC.

	This configuration was too low 10 years ago, and we need to update it
	for modern storage. Update the values to allow 4 times faster writes
	before we pause with ENOSPC.

	With the new configuration:

	[irs]
	volume_utilization_chunk_mb = 2560
	volume_utilization_percent = 20

	We extend the volume when free space is 2048 MiB.

	Testing with old and new configuration show that we can cope now with 4x
	times faster write rate before we VMs pause during extend.

	Before:

	write rate  extends   pauses
	----------------------------
	 75 MiB/s        50        0
	100 MiB/s        50        4
	125 MiB/s        50        4
	150 MiB/s        53       24

	After:

	write rate  extends   pauses
	----------------------------
	200 MiB/s        20        0
	250 MiB/s        20        0
	300 MiB/s        20        0
	350 MiB/s        21        0
	400 MiB/s        20        1
	450 MiB/s        20        2
	500 MiB/s        22        7
	550 MiB/s        23        7

	The downside of this change is allocating more space in the storage
	domain. New empty disk will consume 2.5 GiB instead of 1 GiB.

	Bug-Url: https://bugzilla.redhat.com/2051997

	tests: Fix merge test mocking
	This horrible test was depending on default configuration instead of
	using the config, and is written in a way that make it hard to use the
	config.

	The horrible make_env() context manager using the crappy storagetestlib
	was mocking everything after creating the volumes that need mocking to
	consider the configuration. Change to make create everything inside the
	mock context.

	tests: Use config instead of hard coded value
	Set threshold test was using hard coded value assuming old vdsm
	configuration. Change the test to use vdsm configuration so it does not
	break when the configuration is changed.

	tests: Fix unsafe use of config values
	Test depending on configuration options must use mock config object to
	avoid failing when configuration is modified, or when running on a host
	with non default config.

2022-02-22  Pavel Bar  <pbar@redhat.com>

	backup: adding the 3 new IDs to scratch disk dictionary
	1. Add domain, image, and volume IDs to scratch disk dictionary
	(if they are not 'None').
	The received `backup_disk.drive.scratch_disk` dictionary
	will look like this:
	{
	  "index": 4,
	  "sd_id": "b48cc39a-63e4-47e9-8a0a-8665c1422ef7",
	  "img_id": "2adaf310-c251-4118-8167-e9928f8489d2",
	  "vol_id": "30937b05-ffca-4a1d-879f-91c47ec743a7"
	}

	2. Adding a new test scenario for simulating sending the
	3 new parameters from engine to VDSM.

	Bug-Url: https://bugzilla.redhat.com/1913389

	backup: adding the 3 new properties to 'ScratchDiskConfig'
	Adding the 3 new properties to the "ScratchDiskConfig" object
	that is part of the bigger "BackupConfig" configuration object
	used by backup functionality of VDSM.
	The new properties' names are:
	1. sd_id
	2. img_id
	3. vol_id
	Previously this useful data was missing, now it can be used to
	implement the functionality to minimize scratch disk initial size.

	Bug-Url: https://bugzilla.redhat.com/1913389

	backup: adding the 3 new properties to VDSM scratch disk API
	Add domainID, imageID, and volumeID to {File,Block}ScratchDisk.

	Bug-Url: https://bugzilla.redhat.com/1913389

	backup: fix typo in 'BlockScratchDisk' description
	Fixing a copy-paste typo in "BlockScratchDisk" description.

	Bug-Url: https://bugzilla.redhat.com/1913389

2022-02-22  Ales Musil  <amusil@redhat.com>

	network: Run network tests every week after new containers are built
	By running tests after the container is built we can find any possible
	regression caused by newer packages in the container.

2022-02-22  Nir Soffer  <nsoffer@redhat.com>

	tests: Test drive exceeded time behavior
	The way we set and clear exceeded time is little too delicate. Lets
	protect with it with a test.

	The new test added as simple function since the relevant class was was
	not converted properly to pytest, and we cannot use pytest fixtures in
	the class. Converting the module to pytest properly will have to be done
	later.

2022-02-21  Sandro Bonazzola  <sbonazzo@redhat.com>

	spec: point to src URL

2022-02-21  Nir Soffer  <nsoffer@redhat.com>

	thinp: Include the wait time in total extend time
	If we received a block threshold event, add new "wait" step, measuring
	the time since the block threshold event was received until we handled
	it.

	The "total" time shows now the time since we found that the drive was
	exceeded, until the new size is visible to the guest. This time is
	important for setting up good chunk size and utilization options.

	Example extend timing with this change:

	2022-02-21 12:32:45,915+0200 INFO  (mailbox-hsm/1) [virt.vm]
	(vmId='d0730833-98aa-4138-94d1-8497763807c7') Extend volume
	3b435e98-f6b1-40ab-b3de-18b403049e7a completed <Clock(total=3.85,
	wait=1.55, extend-volume=2.10, refresh-volume=0.20)> (thinp:567)

	common: Support measuring since time in the past
	There are cases when want to measure time since time in the past. For
	example, when we measure the time to extend a volume, we want to measure
	the total time since the time we received a block threshold event.

	Add optional start_time to Clock.start(). When specified, the clock will
	measure that time since the specified start time, instead of the time
	the clock was started.

	A new test show an example for this use case.

2022-02-21  Vojtech Juranek  <vjuranek@redhat.com>

	storage: don't fail when engine sends empty connection list
	When multiple iSCSI SDs use same iSCSI target and one of these SDs is
	put into maintenance, engine sends vdsm disconnect storage server
	request with empty connection list. While it's questionable why engine
	sends such request, vdsm still has handle it and has to do nothing as
	the target is still used by other SDs. Do nothing in this case, but
	log a warning, because this should be fixed in the engine and engine
	shouldn't request disconnecting storage server when it's used by other
	SDs. As calling code expect iterable with results, return empty list.

	Bug-Url: https://bugzilla.redhat.com/2054745

	storage: avoid undefined variable when connectin list is empty
	When connecting or disconnecting storage server, current code assumes
	that connection list sent by engine is never empty. However, this is not
	the case and when there are two or more iSCSI SDs which use same iSCSI
	target, engine sends empty connection list when requesting disconnect,
	see BZ #2054745 [1] for more details. In such case current code fails
	with

	    UnboundLocalError: local variable 'con_info' referenced before assignment

	which is bad in any case. Avoid using potentially undefied variable.

	Better handling of empty connection list will be improved in follow-up
	patch.

	[1] https://bugzilla.redhat.com/2054745

2022-02-17  Harel Braha  <hbraha@redhat.com>

	vdsm-tool: Update libvirt sasl password to scram-sha-256
	currently Passwd use sha-1 which is deprecated, we should move to
	sha-256.

	vdsm-tool: Create libvirt conf file for el9
	In el9, we use the monolithic daemon libvirtd,
	so we need to create a libvirtd config manually
	since it's not shipped by deafult.

2022-02-17  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.50.0.6

2022-02-17  Ales Musil  <amusil@redhat.com>

	net: Fix tc tokens parsing
	Newer version of iproute-tc (>=5.15.0-3) omits the
	"???" from the filter output. Include also "not_in_hw"
	for potential consume, this will ensure that both formats
	are accepted e.g. ".. terminal flowid ??? not_in_hw .."
	and ".. terminal flowid not_in_hw ..". One important thing
	to note, it is safe to consume "not_in_hw" because the value
	is skipped otherwise by the parser.

2022-02-16  Vojtěch Juránek  <vjuranek@redhat.com>

	lvm: use devices file as default config method
	Switch the default from filter based LVM configuration to devices file
	configuration. Any newly installed host or host upgraded to vdsm version
	containing this commit will start to use LVM devices file and LVM filter
	will be ignored.

	If the user has any issue with this setup, it can be reverted by
	creating custom vdsm config file (or modifying if such file already
	exists) and setting config_method to "filter" in lvm section, e.g.:

	    $ cat /etc/vdsm/vdsm.conf.d/99-lvm.conf
	    [lvm]
	    config_method= filter

2022-02-15  Nir Soffer  <nsoffer@redhat.com>

	tests: Test monitoring_needed()
	This is a very simple method, but it was not tested before. In the past
	we tested _monitored_volumes(), which was very complicated, but after
	all the logic moved to Drive.needs_monitoring(), it became trivial.

	All the interesting and important cases we tests are already in Drive
	tests, so there is no need to test them again here. Simplify the test by
	creating a trivial FakeDrive class implementing only needs_monitoring().

	Testing monitoring_needed() tests also _montior_volumes(), enable() and
	disable().

	This change deletes 300 lines, but test coverage is still 76%.

	tests: Remove tests for _query_block_stats()
	The code is already tested in thinp_monitor_test.py. We generate
	libvirt stats from test data and parse it in every call to
	monitor_volume().

	The actual code is pretty simple and does not need special test.

	With or without this test, we have 76% test coverage.

	tests: Update thinp_monitor_test.py TODOs
	We had an old comment about possibly missing tests. It turns out that
	the tests are still missing. Update the comment to make it more clear.

	Remove TODO for extension test, it seems that we test this basic case.

	tests: Add missing assert
	When testing the case of missed block threshold event we checked that we
	set the threshold, but not that we continue and extend the drive. Add
	the missing assert.

	tests: Don't test private methods
	Replace test calling 2 private method with similar test for monitoring
	volumes when drives threshold was already set, and the drives not need
	extension.

	tests: Fix ancient invalid test
	Remove evil faking of VolumeMonitor._update_threshold_state_exceeded().
	The evil faking was added when we fixed the monitor to set a drive as
	exceeded. This broke existing test, so the author faked the call for the
	test to make the test pass. I guess this was done as a temporary work in
	progress hack, but somehow it got merged.

	To fix the test, we added support for specifying replica size. The size
	is the size of the logical volume used as replica. This enables
	simulating the case when the original volume is on file storage, but the
	replica volume is on block storage.

	Since the tests setup is correct, we don't try to extend the volume, and
	the test pass without modifying the private method.

	Fixes: c284b8202f0d151ef6b7a592baaa4d8490a14607

	tests: Rename drive_extension_test
	This tests belongs now to the thinp tests, but it is too big and complex
	to merge with thinp_test.py:

	$ wc -l tests/virt/thinp_*.py
	  766 tests/virt/thinp_monitor_test.py
	  620 tests/virt/thinp_test.py
	 1386 total

	Keep the tests as separate file with better name.

2022-02-09  Ales Musil  <amusil@redhat.com>

	net, tests: Remove skip from bond MAC address test
	The relevant bug was resolved, let's remove the skip.

	net, tests: Remove unicode bridge name test
	As discussed in [0] the systemd-udevd won't support
	unicode characters in el9stream. Let's remove the test
	as oVirt does not support unicode bridge names.

	[0] https://bugzilla.redhat.com/2005367

2022-02-09  Vojtěch Juránek  <vjuranek@redhat.com>

	lvm: use configured devices file value
	Currently path to lvm devices files is hardcoded in lvmdevices module
	and uses default value. This would break if the user for some reason
	configure lvm to use custom devices file.

	Use configured lvm devices file value when searching if devices file
	exists.

	lvm: add function for obtaining lvm config value
	lvmconf module provides a way how to get or set a value in main lvm
	config file, but we have no way how to find out what is value of lvm
	option which will be actually used by lvm. This can be obtained by
	`lvmconfig` tool, which provides, besides other, such info.

	Add function for obtaining configured value for lvm option. The search
	takes into account also default values which are not explicitly
	configured.

	lvm: use common constant for lvm executable
	We have defined common command for lvm executable. Use this constant
	instead of hardcoded one. This constant can be easily overridden in
	case of need and we also don't duplicate the code in various modules.

	lvm: remove absolute import from lvmconf module
	Not needed any more, we are on py3.

2022-02-09  Ravi  <kottapar@gmail.com>

	updated url for vdsm-api.yml
	The current url is giving a 404 as this file is moved to lib/vdsm/api

2022-02-07  Vojtěch Juránek  <vjuranek@redhat.com>

	lvm: use devices if vdsm is configured so
	If vdsm is configured, use `--devices` argument in lvm commands instead
	of lvm filter. Otherwise use lvm filter as before.

	lvm: refactor LVMCache._addExtraCfg()
	Separate preparting list of devices and building the lvm filter. This
	will allow to add support for `--devices` argument more easy and clean.

	lvm: cache devices instead of filter
	To be able to use `--devices` argument in lvm commands, we need only
	list of devices, not filter. On the other hand, filter can be easily
	constructed from device list. Cache only allowed devices and use it
	for constructing lvm filter. Also rename related methods to reflect
	we cache devices and not filter.

	lvm: refactor _buildFilter() function
	To be able to replace filter with devices in lvm commands, start with
	refactoring _buildFilter() function. Factor out part which construct
	list of allowed devices. This can be reused also for construcing
	`--devices` argument for lvm commands.

	This is a preparation step for caching devices instead of filter.
	Switch to caching devices will be done in follow-up patch.

	lvm: make filter in config optional
	When lvm devices file is used, lvm filter is ignored and there's no need
	to include it in each lvm command. Make filter optional in lvm config,
	so that we can easily skip filter in the config.

	storage: fix typos in _invalidate_lvm_filter() doc string

	tests: use lvm fake runner
	Use LVM fake runner to test LVM config appended to the commads instead
	of calling private functions.

2022-02-07  Nir Soffer  <nsoffer@redhat.com>

	thinp: Spelling fixes
	Fix few spelling errors in comments and docstrings.

	thinp: Group methods by category
	We don't have any support for this in editors, but this makes it easier
	to learn the code.

	Bug-Url: https://bugzilla.redhat.com/1913387

	vm: Rename get_block_stats()
	The get_ prefix is too generic; we use now query_ prefix for various
	method querying libvirt. This helps to understand when we perform slow
	or expensive queries.

	Bug-Url: https://bugzilla.redhat.com/1913387

	thinp: Extract _query_block_info() helper
	Eliminate duplicate code for getting and updating drive block info by
	extracting a new _query_block_info() helper. Now we have one place we
	need to change to work with volumes instead of drives.

	Bug-Url: https://bugzilla.redhat.com/1913387

	thinp: Add query_block_info()
	When we extend a drive before starting live storage migration, we
	pre-extend the drive. For this we had to get block stats from the volume
	monitor, amend the block info to get drive replica info in some cases,
	and finally update the drive. Extract VolumeMonitor.query_block_info()
	doing all this.

	This removes the last leftovers of monitoring related code from VM. But
	more important, now we can make VolumeMonitor.get_block_stats() and
	the ugly VolumeMonitor.amend_block_info() private, since they are used
	only in VolumeMonitor.

	Bug-Url: https://bugzilla.redhat.com/1913387

	vm: Move amend_block_info() to thinp
	This method is part of extending flow. I kept it in the VM because it
	uses VM.getVolumeSize(), but the volume montior already depends on it,
	and is not really related to the VM.

	This is another ugly hack that will be removed when volume monitoring is
	used for all volumes. Until we remove it, it should live in thinp.

	Bug-Url: https://bugzilla.redhat.com/1913387

	thinp: Don't call VM.extend_volume()
	VM.extend_volume() calls now VolumeMonitor.extend_volume(). Code in
	VolumeMonitor call can this directly.

	Bug-Url: https://bugzilla.redhat.com/1913387

	thinp: Make methods private
	Before modifying this class, we need to simplify it. Make all the
	methods that are not accessed outside of the class private.

	This change reveal few issues:

	- We have code calling clear_threshold directly, but set_threshold is
	  call only internally. This will be fixed by introducing start/stop
	  monitoring API.

	- Some tests test the internal implementation instead of the behavior.
	  These tests have now a TODO to use only the public API. I think some
	  tests will be fixed by adding start/stop monitoring API.

	Bug-Url: https://bugzilla.redhat.com/1913387

	thinp: Make index required
	We don't have any caller that use set_threshold() or clear_threshold()
	without an index. Make the index required and delete the test for
	setting a threshold without an index.

	The format_target() helper was modified to fail if index is not an
	integer, e.g. None.

	Bug-Url: https://bugzilla.redhat.com/1913387

2022-02-03  Ales Musil  <amusil@redhat.com>

	net: Add systemd to all network containers
	For some reason the built done by GH actions
	won't install systemd in the el9 contianers, however
	the local one does that. Add the systemd dependency explicitly
	to prevent that issue.

2022-02-03  Nir Soffer  <nsoffer@redhat.com>

	github: Fix OST condition
	The current if condition always matches, so every comment will trigger
	OST run.

	For example, this comment:
	https://github.com/oVirt/vdsm/pull/49#issuecomment-1028473401

	Triggered this run:
	https://github.com/oVirt/vdsm/actions/runs/1786861822

	I could not find documentation for this but the issue is metioned here:
	https://github.community/t/how-to-write-multi-line-condition-in-if/128477

	Looks like "|" and "${{}}" do not work together. For multi-line if we
	should use only "|".

	Fixes: 588a421d1c97d6eeb5784a162a72bca5f05dd15f

2022-02-02  Vojtěch Juránek  <vjuranek@redhat.com>

	lvm: always disable devices file when filter should be used
	If there are any issues with devices file on the host or user wants to
	use lvm filter for any other reason, always disbale devices file in lvm
	config. It can be turned on from previous run of vdsm config or it needs
	to be overridden on CentOS 9 where devices file is enabled by default.

	lvm: disable devices file when searching for mounts
	Disable devices file when searching for mounts and use filter. This is
	correct as either called when vdsm is configured to use filter and
	threfore devices file should be diaabled or vdsm is configured to use
	devices file, but it's not configured yet.

	This allows us to avoid lvm warnings that we mix devices file and lvm
	filter. Such situation happends when use configured host with devices
	files and later wants revert configuration back to filter.

	lvm: move filters into config_with_filter() function

	lvm: add vdsm option to choose configuration method
	Allow user to choose configuration method. This is usefull for testing
	but also in cases when somethong gone wrong and we need to fall back
	to lvm filter configuration.

	As the switch to devices file is not completed yet, old 'filter' method
	is still the default.

	Bug-Url: https://bugzilla.redhat.com/2012830

	lvm: add config print outs
	Print config message for the user about configuring lvm devices and
	adjust existing print outs to be able to reuse them also for devices.

	lvm: add lvm device configuration
	Add function for configuration of lvm devices file and adjust
	lvmdevices.configure() to include also removing lvm filter.

	In follow up patches printing info and config summary will be added
	and also the config option for choosing config method will be added.

	Bug-Url: https://bugzilla.redhat.com/2012830

	lvm: squash all print outs into one function
	Bug-Url: https://bugzilla.redhat.com/2012830

	lvm: refactor lvm filter config
	Move various print outs into separate function so the code flow is more
	clear and some print outs can be later reused.

	Move filter configuration into separate function as well, as later on
	another configuration possibility (using devices files) will be added.

	Bug-Url: https://bugzilla.redhat.com/2012830

	lvm: return also VG name in MountInfo
	For creating lvm devices, we need VGs of mounted devices. This is what
	lvmfilter.find_lvm_mounts() does. Reuse this code and return also VG
	names, so we can use it also for lvm devices file.

	Bug-Url: https://bugzilla.redhat.com/2012830

	lvm: add function for removing lvm filter
	Once we configure lvm to use devices we will eventually remove exiting
	lvm filter. Add function for removing filter from lvm config file.

	Bug-Url: https://bugzilla.redhat.com/2012830

	lvm: run check after devices file configuration
	Run lvmdevices --check after configuring devices file. As, according
	to LVM developers, the behavior of this functionality is not entirely
	or strictly well defined yet, don't raise any exception, just log the
	warning.

	lvm: add lvm devices file configuration
	Add function for configuration of lvm devices file. It enables lvm
	devices and creates initial devices file.

	As we call `vgimportdevices` with cmd line config option which enables
	devices file, we can create devices file first and when we succeed, we
	can enable devices file in lvm config.

	lvm: create initial lvm devices file
	If lvm devices file wasn't enabled, after enabling it we need to create
	initial devices file. Find all the mounts and corresponding VGs and
	import the devices using `vgimportdevices`. To avoid misconfigured lvm
	filter, which is taken into account during vgimportdevices, enable all
	the devices.

	Bug-Url: https://bugzilla.redhat.com/2012830

	lvm: add function for configuring devices file
	Add function to configure lvm to enable or disable devices file.

	Bug-Url: https://bugzilla.redhat.com/2012830

	lvm: introduce lvmdevices module
	Create new module which will contain functions related to lvm devices
	file. Add a function which checks if lvm devices file is configured.
	To use devices file, lvm requires to enable devices/use_devicesfile in
	lvm configuration and devices file must exists.

	Bug-Url: https://bugzilla.redhat.com/2012830

2022-02-02  Nir Soffer  <nsoffer@redhat.com>

	thinp: Unify callback names
	Unify the callbacks name with other callbacks. The callback is named
	{method_name}_completed to make it easy to find the callback when you
	look at the method, or the other way around.

	The storage system refereed to the private callback for explaining why
	we don't refresh extended volumes on the SPM. This is bad, requiring
	unrelated changes when renaming private methods. Change the text to
	explain what we do without referring to the actual method name.

	Bug-Url: https://bugzilla.redhat.com/1913387

	vm: Move extending code to thinp
	Move the complex extending code to the thinp module. This will make it
	easier to monitor volumes instead of drives. The VM is not affected now
	by the way we monitor or extend volumes.

	The VM part during disk extension is now:

	- Call VolumeMonitor.monitor_volumes() periodically. This is done by
	  the periodic module.

	- Call VolumeMonitor.extend_volume() when volumes need to be
	  pre-extended. This is called when starting disk replication, and
	  before starting the commit phase in live merge. This will be
	  eliminated when we start to monitor the relevant volume.

	- Handle callbacks from VolumeMonitor._after_volume_extension:
	  - Refresh migration volume
	  - Refresh local volume
	  - Resume if needed when extend volume completed

	Because extending volumes is implemented now by the volume monitor, we
	have to use the real object in live merge test.

	Bug-Url: https://bugzilla.redhat.com/1913387

	vm: Merge methods for refreshing destination
	We had 2 confusing methods for refreshing the destination volume during
	migration. Since both of them are small and simple, and the private
	method should not be called in any other flow, merging them makes the
	code easier to follow.

	While merging the methods, streamline the way we raise on errors.

	Bug-Url: https://bugzilla.redhat.com/1913387

	vm: Make _refresh_destination_volume() public
	This will be called from volume monitor when extend volume completed
	successfully, so the volume monitor does not have to care about VM
	internals.

	Bug-Url: https://bugzilla.redhat.com/1913387

	vm: Rename _resume_if_needed and make it public
	We have very confusing _resume_if_needed() and maybe_resume() - which
	method should be called to resume a VM? Rename _resume_if_needed() to
	extend_volume_completed(), live merge _extend_completed callback. Now
	it is clear that this should be called only when extend flow completed,
	and other code should call maybe_resume().

	This new method is public so we can move the extend code to the thinp
	module.

	Bug-Url: https://bugzilla.redhat.com/1913387

	vm: Reorder extend methods
	Order the extend methods according the flow:

	- extend volume (entry point)
	- extending replica (used during live storage migration)
	- extending volume
	- refreshing destination volume (used in migration)
	- refreshing local volume
	- verifying extension
	- updating drive size
	- resume vm

	Most of these method should move to thinp module. Reordering them helps
	to understand the dependencies and will make a cleaner patch when we
	move them.

	Bug-Url: https://bugzilla.redhat.com/1913387

	vm: Rename VM.extendDriveVolume
	This should be used to extend any volume (e.g. scratch disk). Rename the
	public and private methods to reflect the intended use.

	Bug-Url: https://bugzilla.redhat.com/1913387

	vm: Move monitoring methods to thinp
	Move VM.monitor_volumes() and VM.extend_drive_if_needed() to
	thinp.VolumeMonitor. This decouples the VM class from the VolumeMonitor
	class, which will make it easier to change the volume montior to work
	with volumes instead of drives.

	Bug-Url: https://bugzilla.redhat.com/1913387

	vm: Make _amend_block_info() public
	This method is a ugly hack for monitoring drives during disk
	replication. This hack should be eliminated when finish volume
	monitoring work, but for now we need it public so we can move the
	monitoring code to thinp.

	Bug-Url: https://bugzilla.redhat.com/1913387

	vm: Make _drive_volume_index public
	This is needed when we start to monitor drive active volume, and I want
	to move that code to the thinp module. Rename to
	query_drive_volume_index() to match query_drive_volume_chain() so we can
	call it outside of the VM.

	Bug-Url: https://bugzilla.redhat.com/1913387

	vm: Rename VM.refresh_drive_volume
	This method works with any volume (e.g. scratch disk) and do not accept
	a drive object, so there is no reason to include the drive in the name.

	Bug-Url: https://bugzilla.redhat.com/1913387

	vm: Rename drive_get_actual_volume_chain
	I want to make _drive_volume_index public, since it is needed for moving
	thin provisioning code out of VM class, so we need good way to name
	drive related queries. Start by renaming existing queries.

	Bug-Url: https://bugzilla.redhat.com/1913387

	vm: Rename the __private methods
	There is no reason to use __ prefix for the methods related to extending
	drives. Using single _ prefix is enough to mark them as private. Using
	__ is needed when you want to allow a subclass to override a super class
	method but keep the option to call the super implementation.

	Bug-Url: https://bugzilla.redhat.com/1913387

	vm: Make VM._after_volume_extension private
	This method became public for no reason when working on extending
	volumes during live merge. Make it private again.

	Bug-Url: https://bugzilla.redhat.com/1913387

	virt: Rename drive monitoring to volume monitoring
	In the past we could monitor only the active volume of a drive, so drive
	monitor was the right term.

	However, now we can monitor also:
	- Scratch disk volume
	- Internal volume during live merge (blockCommit)
	- Target volume during disk replication (blockCopy)

	Modifying the Drive object to keep monitoring info is hard and does not
	look like the right way. The plan is to keep monitoring info in the
	volume monitor.

	When we want to monitor a volume, we will call:

	   VolumeMonitor.start_monitoring(volume)

	This will set a block threshold on the volume using libvirt, and keep
	the volume state needed to handle block threshold events.

	When we want to stop monitoring a volume, we will call:

	   VolumeMonitor.stop_monitoring(volume)

	This will disable block threshold events and remove the volume from
	the monitor.

	When we get a block threshold event, the monitor will update the volume
	state.

	In the periodic job, we will query the volumes that need to be extended
	and extend them.

	When a drive changes the path, for example at the end of live storage
	migration or live merge, we will remove the old volume from the volume
	monitor, and add the new volume.

	Starting this work with renaming to make the intent of the code clear:

	- drivemonitor module to thinp - this module will include all the code
	  needed for handling thin provisioned block volumes.
	- drivemonitor.DriveMonitor class to thinp.VolumeMonitor.
	- VolumeMonitor.monitored_drives to VolumeMonitor.monitored_volumes.
	- VM.drive_montior to VM.volume_monitor.
	- VM.montior_drives to VM.montior_volumes
	- periodic.DriveWatermarkMonitor to periodic.VolumeWatermarkMonitor.
	- "drive monitor" to "volume montior" in doc/thin-provisioning.md.

	Bug-Url: https://bugzilla.redhat.com/1913387

	Simplify VM._amend_block_info()
	This function is needed to amend block info when replicating a
	non-chunked drive to chunked drive, but it was abused to set
	Drive.blockinfo, used to log changes in extension info.

	Convert Drive.blockinfo to a property, and move the logging code to the
	property setter. Set Drive.block_info after separately from the call to
	amend the block info.

	This change allows amending block info only for drives during
	replication, which will make it easier handle scratch disks in the
	monitoring flows.

	Bug-Url: https://bugzilla.redhat.com/1913387

2022-02-02  Harel Braha  <hbraha@redhat.com>

	github: Add OST workflow
	Start OST run for current pull request when adding a comment like:

	    /ost

	replaceing PR: https://github.com/oVirt/vdsm/pull/44

2022-02-02  Nir Soffer  <nsoffer@redhat.com>

	vm: Remove ancient unused method
	VM.getChunkedDrives() is not used since 2017, remove it.

	Fixes: 6a251845f4a237b9c9b78e5d7461bcd9c6184c3b

	lvm: Suppress warning about disabling device mapper
	Each time we extend a disk, we log this warning:

	2022-02-01 22:25:48,531+0200 WARN  (mailbox-spm/2) [storage.lvm] Command [...] succeeded with
	warnings: ['  WARNING: Activation disabled. No device-mapper interaction will be attempted.']
	(lvm:347)

	Suppress the warning, it is expected when using "--driverloaded n".

	Fixes: 9a2c37ece0cb9d5438c88a3b6ea3c7447ed235a1

	tests: Fix lvm suppress warning tests
	Verifying return code and filtered error was never run since it was done
	inside the pytest.raises() context. Verifying the error must be done
	outside of the context.

	Fixes: cbc9b542966a5fa070a1ba8fa72455eeecbf0d1e

2022-02-02  Ales Musil  <amusil@redhat.com>

	automation: Update network containers to use centos as base
	The tag for build container has changed and the container itself
	grew in size. With packages for JavaScript and Java there is no
	point in keeping it as base.

2022-02-01  Ales Musil  <amusil@redhat.com>

	net: Lower severity of nic not configured for monitoring
	The message was logged with warning, which results in some confusion
	and spam in the journal. Let's have it as debug as the intention
	is really to know what was happening with the monitoring.

2022-01-26  Martin Perina  <mperina@redhat.com>

	Modify support for cluster level 4.7
	As a part of 05eefda9703ec21a87428d4c8a552eb019f28101 we have introduced
	cluster level 4.7, which requires libvirt >= 7.10.

	But recently libvirt has been updated to 8.0, which should be the version
	included in RHEL 8.6 and CentOS Stream 9. So we are bumping cluster level
	4.7 to require libvirt >= 8.0

	Bug-Url: https://bugzilla.redhat.com/2021545

2022-01-26  Ales Musil  <amusil@redhat.com>

	Use oVirt upload artifacts action
	In order to use RPMs on OST we need to adopt the common
	naming. Use oVirt upload artifacts action that keeps
	the convention in one place.

2022-01-24  Nir Soffer  <nsoffer@redhat.com>

	github: Fix can_push condition
	When we set:

	    can_push: ${{ github.repository_owner == 'oVirt' }}

	Github converts the boolean value to the strings 'true' or 'false'[1],
	so checking:

	    if: ${{ env.can_push == true }}

	Does not match and now we never push to quay.

	Example run on my fork show that we still skip the push on forks:
	https://github.com/nirs/vdsm/actions/runs/1740365670

	Unfortunately, the only way to test this properly is to merge to master.

	[1] https://docs.github.com/en/enterprise-server@3.0/actions/learn-github-actions/expressions#functions

	Fixes 746975bb434e5fbb4c41c3e119728bc3322fb5e9

	github: Skip push to quay on forks
	The containers workflow fails every week in my vdsm fork because the
	ovirt organization secrets are not available in forks. Skip the push to
	quay if the repository is not owned by "oVirt" organization.

2022-01-20  Milan Zamazal  <mzamazal@redhat.com>

	github: Update codeql-analysis workflow
	The current version is obsolete and fails.  Let's update according to
	https://github.com/github/codeql-action

	virt: Add support for parallel migration connections
	libvirt added the capability of using several parallel migration
	connections to speed up migrations.  This patch adds support for a new
	migration parameter `parallel' to use the feature.

	It is expected that Engine will provide maxBandwidth argument to
	maximize throughput when parallel connections are requested.  If
	maxBandwidth is not provided, the default Vdsm limit is applied as
	usual.

	It is possible to explicitly disable parallel migration connections by
	setting the `parallel' parameter to 0.  This can be useful if implicit
	parallel migration connection setting for a VM is introduced in
	future and we need to disable it for a particular migration.

	Note that it is meaningful to set the number of parallel migration
	connections to 1.  In such a case, the feature is enabled for the
	migration, which means a different way of copying data is used in QEMU
	than without this feature enabled and it can make performance or
	reliability differences.

	This patch needs support on the Engine side to provide the new
	parameter.

	Bug-Url: https://bugzilla.redhat.com/1975720

2022-01-18  Tomáš Golembiovský  <tgolembi@redhat.com>

	static: load vfio-pci module
	When PCI pass-through is used and vfio-pci module is employed for the
	devices it is useful to pre-load the module and don't wait for something
	else to load it. Without the module engine does not show any module in
	PCI devices list which is not too user friendly.

	Bug-Url: https://bugzilla.redhat.com/1960808

2022-01-18  Nir Soffer  <nsoffer@redhat.com>

	nbd: Disable allocation-depth for raw format
	qemu-nbd 6.2.0 introduced a bug[1] when accessing raw image extents. The
	first call works correctly, but the next calls the entire image is
	reported as data.

	I could not reproduce the issue using imageio API, so we are likely not
	affected by this bug. However this reveal that we were not careful
	enough in qemu-nbd configuration. We always enabled --allocation-depth
	when it is useless for raw format, always reporting depth=1.

	Change to enable --allocation-depth only for qcow2 format when it
	has meaningful output.

	[1] https://bugzilla.redhat.com/2041480

	tests: Adapt nbd tests to qemu-nbd 6.2
	Looks like zero detection when writing to qcow2 image has changed, and
	now the zeroes are converted to zero clusters. This looks like a nice
	bug fix compared with the previous behavior.

	Update the test so it does not check actual image size for qcow2
	format, which is to fragile for testing anyway.

	tests: Fix path to supervdsmd
	It moved recently to static/libexec/vdsm but the path in nbd test was
	not updated.

2022-01-16  Benny Zlotnik  <bzlotnik@redhat.com>

	storage: add sequence metadata field
	This patches makes use of the previously introduced sequence metadata
	field which will be used to help resolve the question of which leaf
	volume is newer when multiple leaves are present, which usually happens
	when converting disks or previewing snapshots.

	The field will have a 0 default value, volumes which were created before
	this field was added will report the default value. The sequence will be
	increased by 1, but there may be gaps when internal volumes are removed,
	so the only guarantee is that newer volumes will have higher sequence
	number.

	An example for volume metadata containing the new field:
	$ cat 13178a7f-2b1b-49c7-a99c-2458a4beef27.meta
	CAP=1073741824
	CTIME=1639041013
	DESCRIPTION={"DiskAlias":"","DiskDescription":"{\"DiskAlias\":\"disk\",\"DiskDescription\":\"\"}"}
	DISKTYPE=DATA
	DOMAIN=3a1bee04-5ba2-4197-9534-ca55b63f978b
	FORMAT=RAW
	GEN=1
	IMAGE=90e24f27-f786-4ec6-9002-326062238d8b
	LEGALITY=LEGAL
	PUUID=00000000-0000-0000-0000-000000000000
	SEQ=7
	TYPE=PREALLOCATED
	VOLTYPE=LEAF
	EOF

	Bug-Url: https://bugzilla.redhat.com/977778

	storage: add sequence field to volumemetadata
	This patch introduces the sequence metadata field in volumemetadata to
	be used later when determining which of two leaves is the newer one.

	Bug-Url: https://bugzilla.redhat.com/977778

2022-01-13  Nir Soffer  <nsoffer@redhat.com>

	CODEOWNERS: Add ownrs for doc/ and README.md
	Adding @oVirt/ovirt-documentation and all maintainers.

	CODEOWNERS: Add missing owners for virt modules
	Some of the virt modules are owned by storage team, and should be
	reviewed by storage team.

	CODEOWNERS: Fix typo in default rule

	vm: Fix VM._amend_block_info()
	In commit c63867bbd416f5b52cd8cc27a3e3557dc558394a

	    vm: Replace blockInfo with block stats

	We introduced a regression, using the wrong physical size from the
	source drive instead of the apparentsize of the replica disk. This may
	cause to register wrong block threshold, which will cause a VM to pause
	during disk replication.

2022-01-12  Ales Musil  <amusil@redhat.com>

	net, tests: Change order of operations for dummy
	With recent NM there seems to a regression which
	causes the dummy to not get the link-local IPv6 address [0].
	The workaround is to set the dummy up first and then set it
	as managed.

	automation: Add network tests
	Add network tests that will be triggered only when
	something in network code changes, there is no need
	to run them with every vdsm change as the functional
	tests can take a while.

2022-01-11  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.50.0.5

2022-01-11  Marcin Sobczyk  <msobczyk@redhat.com>

	tests: yajsonrpc: Use TRAVIS_CI on broken tests
	We don't use jenkins CI anymore.

2022-01-11  Nir Soffer  <nsoffer@redhat.com>

	github: Add CODE_OF_CONDUCT.md file
	This is a copy of the oVirt community code of conduct:
	https://www.ovirt.org/community/about/community-guidelines.html

	github: Add CODEOWNERS file
	Github uses this file to assign reviewers to pull requests
	automatically.

	The project ownership model is messy, so we have very long file. I think
	we should improve this later:
	- Move stuff that is not really common to the right group sub-package
	- Move all files under lib/vdsm/ to some common/ or group sub-package

	github: Update README.md about moving to github
	- Developers are expected to fork the project on github and submit pull
	  requests now.
	- For manual installation, users should pull the project from github.
	- Added a section about looking up code reviews in vdsm Gerrit project.

	More updates are needed about CI, will be done in followup patches.

2022-01-11  Ales Musil  <amusil@redhat.com>

	automation: Automatically build network containers
	In order to be up to date with centos changes, build
	the containers automatically every Sunday on midnight.
	Add also manual trigger for the build so the containers
	can be rebuild manually if needed.

	automation: Update base image for network containers
	The new buildcontainer image should contain the basic
	dependencies and defines common way to consume
	oVirt master repo. Use it as base for all network
	containers.

2022-01-11  Milan Zamazal  <mzamazal@redhat.com>

	tests: virt: Improve test_migrate_from_status
	- Add an explaining comment to the empty pause code.
	- Test NOERR pause code.

2022-01-11  Benny Zlotnik  <bzlotnik@redhat.com>

	copy_data: skip locking source when same image is used
	When we copy a volume and target image is the same as the source volume we must
	skip locking the source with a shared lock and only lock the target with
	an exclusive lock.
	Locking both would lead to a deadlock and fail the operation.

	This patch checks if both source and target are the same image and skips or
	locks accordingly.

	Bug-Url: https://bugzilla.redhat.com/977778

	storage: support preparing illegal volumes
	As part of the effort to allow disk format conversion, we need to
	support creation and preparation of illegal volumes.

	When converting the format, an additional volume will be created under
	the same image, this disk will serve as the target disk for the
	conversion which will be done using SDM.copy_data. The target will be
	created illegal, and will remain so until the conversion is complete.

	This patch handles the preparation of the an illegal volume.

	storage: support creating illegal volumes
	As part of the effort to allow disk format conversion, we need to
	support creation and preparation of illegal volumes.

	When converting the format, an additional volume will be created under
	the same image, this volume will serve as the target volume for the
	conversion which will be done using SDM.copy_data. The target will be
	created illegal, and will remain so until the conversion is complete.

	This can also potentially simplify the disk upload flow by removing an
	API call that changes the volume's legality and instead does this during
	creation.

	This patch adds the ability to control the legality of a volume in
	public APIs.

2022-01-07  Milan Zamazal  <mzamazal@redhat.com>

	virt: Permit resuming migrating VMs stopped due to I/O errors
	We block resuming VMs in MIGRATION_SOURCE status.  For good reasons,
	VMs are paused in the final stages of migrations and we don't want to
	mess with this and generally VMs states during migrations.

	But it also means that if a VM gets temporarily stopped due to an I/O
	error while waiting for migration, it cannot be resumed if the storage
	problem disappears while still waiting for the migration.  If the VM
	is waiting on ongoingMigrations semaphore then it may keep it paused
	unnecessarily for long time.

	We can improve this situation by checking whether the VM in
	MIGRATION_SOURCE status is paused on I/O error and resuming it in such
	a case.  libvirt prevents migrating such VMs and we can be sure that
	we don't interfere with a running migration when resuming such a VM.
	We make the check by asking libvirt directly, to be completely sure
	about the VM status.

	Bug-Url: https://bugzilla.redhat.com/2010478

	virt: Postpone VM resume if blocked due to the VM status
	When a storage domain status changes, it's attempted to resume all the
	VMs currently paused because of I/O error.  If a given VM is down,
	hibernating or migrating then the resume action is not performed.
	This is not a problem when the VM is already down or when it is
	hibernating (it will either get down on success or get resumed on
	failure).  But if the VM is migrating to another host, the migration
	will fail (libvirt cannot migrate VMs paused due to I/O error)
	then the chance to resume the VM gets missed and the VM may remain
	paused although it should be resumed.

	Let's fix the problem by storing a flag about a refused resume attempt
	and resuming the VM after a failed migration.

	Under some circumstances, e.g. when a VM is waiting on
	ongoingMigrations semaphore, it may be better and possible to resume
	the VM immediately.  This will be addressed in a separate patch.

	Bug-Url: https://bugzilla.redhat.com/2010478

	virt: Prevent migrations of VMs paused due to an I/O error
	Migrations of such VMs will be rejected by libvirt, it doesn't make
	sense to start the migration machinery for them.  It's also good to
	avoid resuming such VMs during migration preparation if possible,
	which may fail and prevent the VM from resuming later or which may
	reveal unhandled races or corner cases.

	Migration of such VMs is already prevented in Engine but let's add an
	additional check to Vdsm to handle races, similarly to _not_migrating
	API guard.

	This patch doesn't handle the case when the VM gets paused and
	resumed when the migration process is already running. It will be
	addressed in another patch.

	Bug-Url: https://bugzilla.redhat.com/2010478

2022-01-06  Filip Januska  <fjanuska@redhat.com>

	virt: add shutdown timer after console disconnect
	This patch adds a delay before calling the shutdown() method
	of a vm after user disconnects from a console if the
	consoleDisconnectAction is set to 'Shutdown'. The duration of
	this delay is set by the user in ovirt-engine webui and passed to
	vdsm when a VM ticket is created.

	VDSM will now wait for n minutes after user disconnects from console
	and then start shutting down the VM by calling vm.shutdown(), the rest of
	the process remains the same. The shutdown can be interrupted during this
	period if a user reconnects to the console.
	The vm.shutdown() method has a 'delay' argument which could theoretically
	used in this case, however it serves a different purpose than what this
	patch needs so it's better not to combine them.

	Bug-Url: https://bugzilla.redhat.com/1944834

	concurrent: Add Timer class
	This patch adds a Timer class to the concurrent module.
	The class is based on and behaves pretty much the same
	as the threading.Timer class, except that the thread
	which carries out the target function is created with
	concurrent.thread instead of the regular
	threading.Thread. This makes the Timer class available
	for use in vdsm, while still ensuring every thread is
	created with concurrent.thread.

	The Timer also doesn't inherit from the threading.Thread
	class, like the threading.Timer does, but instead keeps
	the thread object as an attribute.

2022-01-05  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.50.0.4

2022-01-03  Ales Musil  <amusil@redhat.com>

	virt: Add e1000e model
	Add e1000e model to VmInterface model and
	report the same virtual speed as for e1000.

2021-12-22  Nir Soffer  <nsoffer@redhat.com>

	backup: Keep scratch disk info in drive
	When using block based scratch disk, keep the scratch disk info in the
	Drive object during the backup.

	Currently we have only the index, which can be used to register block
	threshold events. We need to modify engine to send also the scratch disk
	domain, image, and volume ids so we can also extend them.

	Bug-Url: https://bugzilla.redhat.com/1913387

2021-12-21  Nir Soffer  <nsoffer@redhat.com>

	storageServer: Improve logging for iscsi connection
	We typically log info message for every operation that modifying the
	system state. Discovering iscsi targets, adding and removing iscsi nodes
	from iscsi db and logging in and out from iscsi targets are such
	operations but had no logs.

	The new logs should make it easier to debug and support the system, and
	will likely reveal bugs we are not aware of. The new logs show only
	when connecting and disconnecting from iscsi targets and when
	discovering new targets.

	storageServer: Use log.exception
	We want to use only log.exception() to log tracebacks.

	storageServer: Simplify results collection
	Enhance the iscsi_login helper to log exceptions and return con, status
	tuple so we can simply append the result to the results list.

	storageServer: Exact max_workers() helper
	connect_all() is too long and hard to follow. Extract the uninteresting
	code to set the number of workers to a helper.

	storageServer: Log iscsi login steps
	Currently we don't have any logs during iscsi login. Add info logs for
	the setup and login steps. We expect that the setup step will be very
	quick, but the login step may take much more time.

	Example logs:

	2021-12-20 10:48:46,577+0200 INFO  (jsonrpc/5) [storage.storageServer]
	Setting up 10 iscsi nodes (storageServer:595)

	2021-12-20 10:48:47,441+0200 INFO  (jsonrpc/5) [storage.storageServer]
	Log in to 10 targets using 10 workers (storageServer:624)

2021-12-21  Vojtěch Juránek  <vjuranek@redhat.com>

	tests: convert storageserver_test to pytest
	Convert storageserver_test to pytest. Also remove __future__ imports,
	which are not needed any more.

2021-12-20  Milan Zamazal  <mzamazal@redhat.com>

	common: Make sure passwords remain protected
	unprotect_passwords modifies the given object in place.  This is
	dangerous and may reveal passwords in log.  Consider the following
	scenario:

	  log.info("Data: %s", object_with_passwords)
	  revealed = unprotect_passwords(object_with_passwords)

	Logging operations may be asynchronous and at the time the parameter
	is written to the log, it may be already modified and contain
	unprotected passwords.

	We already return the modified object from unprotect_passwords so
	there is no real need to modify the original object.  Let's modify its
	copy instead.

	Similar issues may be with protect_passwords but there we are on the
	safe side as for password security.

	Bug-Url: https://bugzilla.redhat.com/2033697

2021-12-20  Vojtěch Juránek  <vjuranek@redhat.com>

	storage: use original iface when iSER doesn't work
	When iSER is configured for connection to iSCSI server, configure iface
	accordingly and try to connect to the server. If it fails, we fall back
	to 'default' iface. This is not correct as original iface might be
	different. Fall back to original iface instead.

	storage: move logger to module level
	Continue with polishing refactoring - move logger to module level.

	Bug-Url: https://bugzilla.redhat.com/1787192

	storage: refactor error translation
	After finishing iSCSI connection error, polish former refactoring and
	move error translation into connection classes.

	Bug-Url: https://bugzilla.redhat.com/1787192

2021-12-20  Nir Soffer  <nsoffer@redhat.com>

	iscsiadm: Remove unneeded locking
	Testing show that we don't need the iscsiadm module lock. If there is
	locking issue in iscsiadm, this lock cannot solve the problem because
	other programs or admin can run iscsiadm commands in the same time vdsm
	runs them.

	Concurrency is managed now by storageServer module. It run the
	operations that may not run correctly concurrently (e.g. modify nodes)
	serially, and the operation that can run concurrently (e.g. login) in a
	thread pool.

	Bug-Url: https://bugzilla.redhat.com/1787192

2021-12-20  Vojtěch Juránek  <vjuranek@redhat.com>

	storage: run login to iSCSI nodes in parallel
	If iSCSI server has more paths/portals and some of the portals are not
	available, engine attempt to connect iSCSI server may time out as we
	run iSCSI login to the portals serially. iscsiamd login (by default)
	runs 8 login attempts, each timing out after 15 seconds, which sums up
	to 120 sec. for each unavailable portal. As default engine command
	timeout is 180 sec., having more than one unavailable portal results
	into connect command timeout.

	To speed up login to iSCSI portals, run them in parallel. If number of
	login workers (by default 10) is higher than number of unavailable
	portals, the command should finish after approximately 120 and thus not
	time out.

	To allow users to configure maximum number of parallel login thread, add
	config option this limits maximum number of parallel login.

	Bug-Url: https://bugzilla.redhat.com/1787192

2021-12-20  Tomáš Golembiovský  <tgolembi@redhat.com>

	v2v: decode bytes input immediately
	Decode input from virt-v2v immediately. This avoids accidentally mixing
	str and bytes later on.

	Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=2026809

2021-12-16  Vojtěch Juránek  <vjuranek@redhat.com>

	storage: move udev settle into dedicated method
	After connecting to iSCSI server we wait for udev to process all new
	devices up to specified timeout (by default 5 seconds). This is called
	also from IscsiConnection._maybe_connect_iser() for no good reason as
	we only connect and disconnect over iSER. udev settle needs to be called
	only when we finish with connecting to the iSCSI server. Move it into
	dedicated function and call it as a last step.

	Bug-Url: https://bugzilla.redhat.com/1787192

	storage: use specific class for connection
	Instead of using parent class for connecting prepared connections,
	return child class from prepare connection function and use it for
	connecting. This finally delegates connection to sub-class and after
	this change we can overide connect function in sub-class and implement
	parallel connection for given sub-class.

	Mixing different types of connection is not supported, so we don't have
	to to care about this case.

	Bug-Url: https://bugzilla.redhat.com/1787192

	storage: provide (dis)connect functions for storageServer module
	To provide nicer API, implement connect() and disconnect() functions in
	storageServer module, so that caller don't have to know anything about
	implementation details, like calling these functions on Connection
	class.

	Better encapsulation makes also better code, more resilient to future
	changes.

	Bug-Url: https://bugzilla.redhat.com/1787192

	storage: unify output of Connection (dis)connect methods
	After moving connect and disconnect functionality into Connection class,
	unify output of connect() and disconnect() methods so that both return
	same output - a list of (connection, status) tuples. This is more nice
	and also less confusing for the user.

	Bug-Url: https://bugzilla.redhat.com/1787192

	storage: create storage server connection parent
	Create parent class for all storage server connections and move bulk
	connect and disconnect functionality there. This will allow child
	classes to override these methods and implement different way how to
	create multiple connections, e.g. creating them in parallel.

	Bug-Url: https://bugzilla.redhat.com/1787192

2021-12-16  Nir Soffer  <nsoffer@redhat.com>

	tests: Fix storageServer tests
	Fix wrong test added in commit ceb07387f29285c6d689e7aaf2d06c5b9c1d3584

	    storage: move connectStorageOverIser() function into IscsiConnection

	This is probably fixed in a pending patch. Github runs only the top
	patch, but we merge the first patches.

2021-12-15  Vojtěch Juránek  <vjuranek@redhat.com>

	storage: refactor preparation of connection objects
	Move preparation of connection objects into storageServer module abd
	make function storageServer.connectionDict2ConnectionInfo() private.

	To reduce number of variables, store whole connection in results
	variable and construct expected output before returning from
	connectStorageServer() method.

	Bug-Url: https://bugzilla.redhat.com/1787192

	storage: move connect to iSCSI target into dedicated function
	To be able to run login to iSCSI target in parallel, we need to decouple
	it from adding node into local iSCSI database. Move login to the iSCSI
	target into dedicate method.

	Bug-Url: https://bugzilla.redhat.com/1787192

	storage: add property for credentials
	To be able to access credentials for iSCSI connection, add property for
	the credentials into IscsiConnection object. It will be used for running
	login into iSCSI object in parallel.

	Bug-Url: https://bugzilla.redhat.com/1787192

	storage: move connectStorageOverIser() function into IscsiConnection
	Encapsulating connectStorageOverIser() into IscsiConnection object
	allows us to simplify this method little bit and also unifies connection
	of all storage server. Now, all storage servers are connected by calling
	connect() method on respective object, without any need to call some
	pre-connect method, which is specific to given storage server.

	To check if initiatorName was passed into as a connection parameter
	from the engine (which shouldn't be possible anyway as it's not covered
	in IscsiConnectionParameters [1]) we need to create iface object of the
	connection. However, the implementation of iscsi.IscsiInterface seems to
	be buggy and throws KeyError when initiatorName is not set so we need to
	check for KeyError when assing it. This would deserver fix and whole
	module revision, but this is different task than BZ #1787192 and can be
	done later.

	[1] https://github.com/oVirt/vdsm/blob/v4.50.0/lib/vdsm/api/vdsm-api.yml#L379

	Bug-Url: https://bugzilla.redhat.com/1787192

2021-12-15  Saif Abu Saleh  <sabusale@redhat.com>

	virt: api: introduce VM screenshot
	Introduce virt API that captures the screen of the VM
	using Libvirt screenshot API, and returns base64
	encoded image string to the engine

	Response example:
	{
	    "data": "UDYKNjQwIDQ4MAoyNTUKA....",
	    "encoding": "base64",
	    "mime_type": "image/x-portable-pixmap",
	}

	Bug-Url: https://bugzilla.redhat.com/1964208

2021-12-14  Tomáš Golembiovský  <tgolembi@redhat.com>

	virt: CPU hotplugging support for dedicated CPUs
	When adding more CPUs for VMs with CPU policy engine has to dedicate the
	new CPUs to match the policy. For this the API has to be extended to
	allow passing the CPU sets for new VMs. Engine will decide whether or
	not it can pass the argument based on cluster version. For older cluster
	versions this argument does not have to be specified even on new VDSM
	because VMs with CPU policy cannot be used there.

	To make the API more flexible, we expect that engine passes
	configuration for all CPUs and not just the new ones. This potentially
	allows engine to relocate also the already assigned CPUs of the VM to
	optimize use of resources on the host. When decreasing the number of
	CPUs there is nothing special to do for the VM with dedicated CPUs.

	In all cases the shared CPU pool has to be updated and VMs without
	dedicated CPUs reconfigured.

	Bug-Url: https://bugzilla.redhat.com/1782077

	virt: migration support for dedicated CPUs
	When migrating the VM we need to remove any CPU pinning that was defined
	by VDSM. The reason is that pinned CPUs may serve a different purpose on
	the destination host (may be dedicated to another VM) or it may not be
	present on the destination at all.

	For VMs with no policy the pinning is simply dropped and it will be
	filled on destination. For VMs with manual CPU pinning or VMs that use
	NUMA auto pinning this affects only vCPUs that don't have any pinning
	defined by user and are using shared CPU pool. Such configuration is
	also removed and it will be filled again on destination.

	For VMs with a policy we expect Engine to pass a new pinning
	configuration to the source VDSM. The information is passed in "cpusets"
	parameter in the form of a list. Each item of the list corresponds to
	vCPU and contains a string with cpuset definition.

	Bug-Url: https://bugzilla.redhat.com/1782077

	virt: update shared pool on VM start/stop
	To keep the list of pCPUs in shared pool accurate it has to be updated on
	every start and stop of a VM. More concretely:

	When starting a VM without CPU policy its vCPUs need to be configured to
	use pCPUs from shared pool.

	When starting or stopping a VM with CPU policy the shared pool has to be
	updated. Starting from a set of all physical CPUs, CPUs of a VM with
	policy dedicated, isolate-threads or siblings are removed from a set.
	For VMs with isolate-threads and siblings policies all siblings of all
	their CPUs are also removed to make sure that whole cores are allocated.
	Finally all VMs without policy need to have their vCPUs reconfigured to
	use only CPUs from the newly constructed shared pool.

	Bug-Url: https://bugzilla.redhat.com/1782077

2021-12-14  Roman Bednar  <rbednar@redhat.com>

	lvm: remove deprecated function for running lvm commands
	The obsoleted cmd() is not used anymore, removing it.

	Bug-Url: https://bugzilla.redhat.com/1536880

	tests: use run_command() in all lvm tests
	New function for running commands run_command() will replace the
	LVMCache.cmd().

	The new function raises on failure (rc!=0) so there is no need to check
	the rc all over the place in tests.

	Also when we need to use pytest.raises now when command failure is
	expected.

	Bug-Url: https://bugzilla.redhat.com/1536880

	lvm: use run_command() in extendVG() flow
	Modify extendVG() flow to use new run_command() which now raises
	LVMCommandError.

	The exceptions raised in this flow now inherit from LVMCommandError and
	provide more details for better debugging.

	Physical volume membership can be skipped in extendVG() to simplify it
	since lvm will fail any attempt to initialize a pv which is already in
	a different vg.

	Bug-Url: https://bugzilla.redhat.com/1536880

	lvm: use run_command() in createVG() flow
	Modify createVG() flow to use new run_command() which now raises
	LVMCommandError.

	The exceptions raised in this flow now inherit from LVMCommandError and
	provide more details for better debugging.

	Default exception in hsm.createVG() can be removed because old code
	provided a context (arguments) and new code can not because the context
	is now a command execution along with all important details and it's
	not available here. So having the default exception does not add any
	value at this point.

	Also resolves following issue - patch passed pylint 2.10 check:
	https://gerrit.ovirt.org/c/vdsm/+/116780/4/lib/vdsm/storage/lvm.py#1409

	Bug-Url: https://bugzilla.redhat.com/1536880

	lvm: use run_command() for enabling pv metadata
	Modify createVG() flow to use new run_command() (for enabling pvs
	metadata) which now raises LVMCommandError.

	The exceptions raised in this flow now inherit from LVMCommandError and
	provide more details for better debugging.

	This command call can be moved to a private helper function for better
	readability.

	Bug-Url: https://bugzilla.redhat.com/1536880

	lvm: use run_command() while reloading lvm devices
	Modify _reloadpvs(), _reloadvgs() and _reloadlvs() flows to use new
	run_command() which now raises LVMCommandError.

	The exceptions raised in this flow now inherit from LVMCommandError and
	provide more details for better debugging.

	Bug-Url: https://bugzilla.redhat.com/1536880

	lvm: use run_command() in changeVGTags() flow
	Modify changeVGTags() flow to use new run_command() which now raises
	LVMCommandError.

	The exceptions raised in this flow now inherit from LVMCommandError and
	provide more details for better debugging.

	Replacing VolumeGroupReplaceTagError with ValueError where appropriate
	same as was already done in changeLVsTags() here:

	https://gerrit.ovirt.org/c/vdsm/+/116780/4/lib/vdsm/storage/lvm.py#1836

	Bug-Url: https://bugzilla.redhat.com/1536880

	lvm: use run_command() in chkVG() flow
	Modify chkVG() flow to use new run_command() which now raises
	LVMCommandError.

	The exceptions raised in this flow now inherit from LVMCommandError and
	provide more details for better debugging.

	chkVG() is used by blockSD.selftest() and blockSD.validate:

	https://github.com/oVirt/vdsm/blob/master/lib/vdsm/storage/blockSD.py#L1217
	https://github.com/oVirt/vdsm/blob/master/lib/vdsm/storage/blockSD.py#L1231

	The callers now handle LVMCommandError and raise the correct error -
	StorageDomainAccessError.

	This error needs to contain a reason as well so we don't hide useful
	details.

	Bug-Url: https://bugzilla.redhat.com/1536880

2021-12-14  Nir Soffer  <nsoffer@redhat.com>

	backup: Keep drive object in backup disk
	Instead of copying drive.name and drive.path to the backup disk, keep
	the drive object. This makes the code little more verbose, but will
	allow us to add scratch disk information in the drive object when backup
	is started.

	Bug-Url: https://bugzilla.redhat.com/1913387

	backup: Improve naming
	The backup module is reusing the term "drive" for backup info for
	particular drive. This confuses the reader, expecting that "drive" is a
	virt.vmdevices.storage.Drive object.

	Rename BackupDrive to BackupDisk, and use "backup_disk" in places when
	we used "drive". Now "drive" is used only when accessing a real Drive
	object.

	Bug-Url: https://bugzilla.redhat.com/1913387

	backup: Parse disk type from backup xml
	Based on the disk type, we can decide if scratch disk needs to monitored
	and extend. When we start backup we already have this info, but getting
	the value from libvirt means we can start monitoring disks anytime, for
	example after vdsm is restarted, based on libvirt state.

	Here is an example parsed backup xml:

	    {
	      'incremental': '7554d598-edaf-49ba-a476-fde54ca3c7fd',
	      'socket': '/run/vdsm/backup/e8ef325d-058e-416c-b9a2-8b9b90da811b',
	      'disks': {
	        'sda': {
	          'index': 9,
	          'exportname': 'sda',
	          'type': 'file',
	        },
	        'vda': {
	          'index': 10,
	          'exportname': 'vda',
	          'type': 'file',
	        },
	      },
	    }

	Bug-Url: https://bugzilla.redhat.com/1913387

	backup: Use exportname to create backup url
	Libvirt exposes now the exportname via the backup xml[1]. We can set
	custom export name when starting a backup, but we don't do this yet, so
	we always get the drive name. Using the exportname make the code future
	proof in case we want to use a custom export name.

	Here an example parsed backup xml from the tests:

	    {
	      'incremental': '90f469ad-1ed2-46fc-9851-0d5db9b8515c',
	      'socket': '/run/vdsm/backup/db250ab6-ce07-4d83-ae80-6b8c5557e956',
	      'disks': {
	        'sda': {
	          'index': 9,
	          'exportname': 'sda',
	        },
	        'vda': {
	          'index': 10,
	          'exportname': 'vda',
	        },
	      },
	    }

	[1] https://libvirt.org/formatbackup.html

	Bug-Url: https://bugzilla.redhat.com/1913387

	backup: Parse incremental element
	When parsing backup xml, include also the incremental element. We don't
	use this yet, but it will help to debug the system.

	When creating a backup info response, we can use the incremental flag
	instead of passing another argument.

	Here is an example parsed backup xml:

	    {
	      'incremental': 'ef111c34-ee9c-4ea7-afb9-861f3b87b825',
	      'socket': '/run/vdsm//backup/6cd3b5b9-7b4c-49a0-9bbc-24beacf5112e',
	      'disks': {'sda': 9, 'vda': 10},
	    }

	Bug-Url: https://bugzilla.redhat.com/1913387

	backup: Parse index from backup xml
	When starting a backup we need to get the scratch disk index for
	monitoring and extending scratch disks on block storage.

	Parsing backup xml and creating backup info response was split to 2
	parts:

	- _get_backup() - get backup xml, parse it, and return a dict.

	- _backup_info() - create backup info response using the result of
	  _get_backup().

	Both start_backup() and backup_info() uses the new functions to created
	the response, eliminating duplicate code.

	We don't have a good way to test the parsing yet, but we can verify the
	parsing manually using the debug logs:

	20:20:28,185 DEBUG   (MainThread) [test] backup_id 'c9fe8695-...-1fb634e355d9'
	info: {'socket': '/var/tmp/.../backup/c9fe8695-743e-4069-95db-1fb634e355d9',
	'disks': {'sda': 7, 'vda': 8}} (backup:223)

	start_backup() will use this later to start monitoring the scratch
	disks.

	Bug-Url: https://bugzilla.redhat.com/1913387

	automation: Remove all tests
	Remove all the tests that fail randomly in gerrit - we have fast and
	reliable CI in github now. The tests jobs only overload the CI and cause
	other jobs to fail.

	Remove all the build artifacts jobs except x86_64 el8stream/rhel8, used
	by OST. Currently build artifacts jobs fail randomly in one of the
	nodes.

	The actual test files in automation/ were not removed yet, to make it
	easy to enable the tests if we have a need. We will remove the files
	once we move to github completely.

2021-12-13  Nir Soffer  <nsoffer@redhat.com>

	hsm: Remove stale lvm locking type validation
	This validation always fails since we moved to RHEL 8 with:

	    2021-12-13 12:33:47,300+0200 DEBUG (MainThread) [common.commands]
	    FAILED: <err> = b'  Configuration node global/locking_type not
	    found\n'; <rc> = 5 (commands:99)

	We could not remove this in the past since we supported an old lvm
	version on Fedora 30 that used locking_type.

	lvm: Remove lvmetad leftovers
	Since RHEL 8 lvmetad was removed, but we could not remove the code
	disabling it since we supported Fedora 30. Remove the useless code.

2021-12-10  Milan Zamazal  <mzamazal@redhat.com>

	virt: Indicate whether VM CPU stats are real or initial
	Before VM CPU stats are available, Vdsm reports zero initial values
	for them.  ovirt-hosted-engine-ha relies on those stats when handling
	the Engine VM.  The initial fake VM stats may confuse Engine VM
	monitoring and induce undesirable actions such as restarting the VM on
	another host without a good reason.

	There is no good way to distinguish the initial fake CPU stats from
	real CPU stats on the Engine side.  Let's add a new flag, cpuActual,
	distinguishing the two cases.  It is set to true when all the CPU
	stats are based on actual measured values.

	It would be better to simply omit the initial fake CPU stats.  But we
	must keep them for compatibility with Engine 4.2, which expects their
	presence.

	Bug-Url: https://bugzilla.redhat.com/2026263

2021-12-08  Nir Soffer  <nsoffer@redhat.com>

	userstorage: Retry creating loop device
	Creating a loop device with the --sector-size option may fail randomly
	if the device has dirty pages from previous usage. This was fixed in
	losetup from util-linux 2.37.1, but this version is not available in
	Centos Stream 8.

	Fixed by adding a retry loop, similar to the retry loop used internally
	in losetup.

	Here is example random failure, fixed on the first retry:

	    [userstorage] WARNING Attempt 1/20 failed: losetup: /dev/loop5: set
	    logical block size failed: Resource temporarily unavailable

	ci: Enable timestamp in rpm names
	For merged patches we want the standard git describe string as used in
	vdsm copr builds[1].

	For pull requests or local builds enable the timestamp that makes it
	easy to upgrade a local development build.

	The timestamped builds also solve the issue when OST fail to run your
	patch because it is considered older than the latest build. With this
	change, a patch built in github is always newer than the latest build
	with the same version (e.g. v4.50.0.2).

	Example merged patch build:
	https://github.com/nirs/vdsm/runs/4408896025

	Example pull request build:
	https://github.com/nirs/vdsm/actions/runs/1535544653

	[1] https://copr.fedorainfracloud.org/coprs/ovirt/ovirt-master-snapshot/package/vdsm/

2021-12-08  Ales Musil  <amusil@redhat.com>

	net, tests: Load bonding module on GH actions
	Create and then remove bond to load the module.
	Since we are already running inside container
	we cannot simply use modprobe bonding. Fortunately
	creating bond through iproute2 loads the module for us.

	net, tests: Enable IPv6 on GH actions
	IPv6 is disabled in docker containers on GH actions.
	Enable IPv6 and remove skips for working tests.

2021-12-07  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.50.0.3

2021-12-07  Roman Bednar  <rbednar@redhat.com>

	lvm: use run_command() in removeLVs() flow
	Modify removeLVs() flow to use new run_command() which now raises
	LVMCommandError.

	The exceptions raised in this flow now inherit from LVMCommandError and
	provide more details for better debugging.

	The origial error (CannotRemoveLogicalVolume) is used as a wrapper for
	other errors so we can not change it to inherit from LVMCommandError
	which is what we need in removeLVs(). In this case we can add a new
	exception for this flow - LogicalVolumeRemoveError.

	Bug-Url: https://bugzilla.redhat.com/1536880

2021-12-06  Vojtěch Juránek  <vjuranek@redhat.com>

	storage: move storage server related helpers from hsm
	HSM module is huge and contains various function related only to some
	other module. Such functions are e.g. helpers for connecting to storage
	servers. Move these functions partially into storageServer module and
	those related only to iscsi connection to iscsi module.

	Bug-Url: https://bugzilla.redhat.com/1787192

2021-12-03  Milan Zamazal  <mzamazal@redhat.com>

	tests: Fix a typo in a pasword test utility function

2021-12-03  Marcin Sobczyk  <msobczyk@redhat.com>

	ci: Introduce 'tests' github job
	This patch introduces another job to github action-based CI.

	This time we're running 'tests' inside a privileged container
	based on our 'vdsm-test-centos-8' image.

2021-12-03  Nir Soffer  <nsoffer@redhat.com>

	gitlab: Add rpm job
	Unlike github, the archive contains the directory "exported-artifacts".

	Example build:
	https://gitlab.com/nirs/vdsm/-/pipelines/421515813/builds

2021-12-03  Martin Perina  <mperina@redhat.com>

	Remove dynamic detection of 4.5/4.6 supportability
	As VDSM requires libvirt >= 7.6.0 in RPM spec file, there is no need for
	dynamic detection of 4.5 (libvirt >= 6.6) and 4.6 (libvirt >= 7.0)
	cluster levels supportability and we can add those cluster levels into
	the static list along with other older cluster levels.

	core: Add support for cluster level 4.7 and engine 4.5
	This patch adds support for cluster compatibility level 4.7. VDSM will
	report support for 4.7 level only when it's running on RHEL 8.6 or
	CentOS Stream/RHEL 9 (libvirt >= 7.10.0).
	This patch also adds support for ovirt-engine 4.5.

	Bug-Url: https://bugzilla.redhat.com/2021545

2021-12-02  Sandro Bonazzola  <sbonazzo@redhat.com>

	reduce W504 issues
	reduced W504 issues by covering init, static and vdsm_hooks directories.
	Updated statistics.

	tox: add note about W503, updated stats

	pep8: fix E203
	fixed errors E203 (https://www.flake8rules.com/rules/E203.html)
	detected by flake8.

	Excluded from the test specific lines which are controversial and would
	make Black unhappy while making flake8 happy.

	Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=2004412

2021-12-02  Nir Soffer  <nsoffer@redhat.com>

	lvmfilter: Don't use /dev/disk/by-id/lvm-pv-uuid-*
	When the device is a multipath device, the udev link points to the
	actual path (/dev/sda) during early boot. The link is updated to point
	to the multipath device only later. This creates a race during early
	boot when lvm can grab the device before multipath.

	Since lvm2-2.03.14-1.el8.x86_64 (Centos Stream 8) oVirt node boot breaks
	when using "stable" udev links. According to David Teigland:

	    Using a link with the PVID may have sort of worked in the past, but
	    it probably should not have.  I'd call it accidental, it's depending
	    on a quirk of udev processing, not anything in lvm itself.

	The change in lvm may be reverted, but I think we should get rid of the
	udev "stable" links anyway.

	This change reverts commit db13e4bc58460297301f9bc7fa4d1234d546a6b6

	    lvmfilter: Use /dev/disk/by-id/lvm-pv-uuid devlinks for pv naming

	but it is not possible to simply revert the commit since additional code
	added based on that commit. Instead we changed the behavior:

	1. When computing a filter, use the device names reported by lvm.
	   This fixes the attached bug. When lvm reports /dev/mapper/xxx this is
	   the device that will be used in the filter.

	2. When analyzing existing filter, recommend to replace filter including
	   "stable" udev links with filter including the device names.

	The original bug[1] will be solved in 4.5 by using the new lvm devices
	feature[2].

	[1] https://bugzilla.redhat.com/1635614
	[2] https://bugzilla.redhat.com/2012830

	Bug-Url: https://bugzilla.redhat.com/2016173
	Bug-Url: https://bugzilla.redhat.com/2026370
	Related-to: https://bugzilla.redhat.com/2026640

2021-12-02  Marcin Sobczyk  <msobczyk@redhat.com>

	ci: Introduce 'rpm' github job
	This patch introduces an 'rpm' github job.

	Apart from creating RPMs themselves the new script also uses
	'createrepo_c' to create DNF repo data in the exported directory.
	This allows using zip archives created by GH Actions like
	"portable repositories" that will be passed to OST runs in the future.

2021-12-02  Nir Soffer  <nsoffer@redhat.com>

	gitlab: Add CI pipeline
	Start simple pipeline, running lint and tests-storage jobs. We will add
	more tests once they are added to github CI workflow.

2021-12-01  Nir Soffer  <nsoffer@redhat.com>

	README.md: Add Github CI status badge
	Replace the dead Travis badge with Github actions badge.

	github: Add storage tests job
	Add job running "make tests-storage".

	We use --privileged option since creating loop devices in the container
	does not work without this option in github actions. This works in
	gitlab CI, so we probably have a better way to do this, but lets start
	with something that works on github.

2021-12-01  Marcin Sobczyk  <msobczyk@redhat.com>

	containers: Add 'createrepo_c' to the image
	This patch adds 'createrepo_c' package to the container image.
	This will allow us to create DNF repository data when building
	RPMs in GitHub Actions.

	ci: Introduce github actions-based linters
	This patch adds a simple github job definition that runs linters in our
	container image on every push.

2021-12-01  Tomáš Golembiovský  <tgolembi@redhat.com>

	qga: use libvirt API to gather disk information
	We used to call qemu-ga directly with guest-get-disks command to get the
	names and serial numbers of all disks in the VM. We can now use API
	provided by libvirt (since 7.3.0).

	Bug-Url: https://bugzilla.redhat.com/1919857

2021-11-30  Tomáš Golembiovský  <tgolembi@redhat.com>

	virt: shared pool creation
	NUMA code is extended to build a list of CPUs on a core for each core
	available. This is then used to build a list of all CPUs in a shared
	pool -- i.e. CPUs available to all VMs without any specific CPU policy.

	The list is built in a way that:

	- CPUs of VMs with manual CPU pinning or NUMA auto-pinning policy are
	  included in shared pool

	- CPUs of VMs with dedicated policy are excluded from shared pool

	- CPUs of VMs with isolate-threads or siblings policy are excluded from
	  shared pool and all their siblings as well, so that whole cores are
	  removed from shared pool and left exclusive to the particular VM

	The updates need to be exclusive and cannot run concurrently, otherwise
	the assigned CPU sets maybe wrong. Two racing VM.destroy calls are
	problematic because we could fail to increase the shared pool so the
	configured CPU set will be smaller than it could be for some VMs. Racing
	VM.destroy and VM.create is much more problematic as it can lead to
	situations where dedicated CPU would be used by other VMs.

	Bug-Url: https://bugzilla.redhat.com/1782077

	numa: drop @cache.memoized and cache capabilities manually
	The _numa() call is cached (memoized) based on arguments used when
	calling the function. The _numa() function optionally took libvirt
	capabilities as an argument. This argument however was never used,
	except in tests. Meaning that _numa() was normally called only without
	arguments in which case libvirt capabilities were evaluated only the
	first time the function was called (by _numa() function itself).

	Recent patch changed how we treat _numa() in Host.getCapabilities API
	call. We fetch fresh libvirt capabilities and pass it as argument to
	_numa(). This causes two problems. First, it now causes small leak
	because the cache is allowed to grow indefinitely. Secondly, the results
	of re-evaluated _numa() call are not available to the rest of the VDSM
	code that calls _numa() without arguments (e.g. in sampling.py for every
	sample).

	The first problem could be easily solved by changing the memoizing
	decorator to functools.lru_cache(maxsize=N). But to solve also also the
	second problem the caching cannot be done based on arguments. We need
	something special because we don't want to fetch the libvirt
	capabilities every time we need to call _numa() and we also
	don't want to re-evaluate _numa() on every call.

	This patch removes the @cache.memoized decorator and creates caching in
	numa.py There is a separate update call that fetches fresh libvirt
	capabilities. The capabilities are examined only when they change.

	virt: store CPU policy and pinned CPUs in metadata
	This is a first patch of the CPU policies series. We should get the CPU
	policy in VM metadata. If it is not there it has to be detected and
	stored there so that we know it on recovery. Otherwise, when we start
	managing the vCPUs and add pinning it may be falsely mistaken for manual
	CPU pinning. For the same reason we also store which vCPUs are manually
	pinned. Later, the vCPUs without pinning will be using pCPUs from shared
	pool and we need to remember which vCPUs were pinned by the user and
	which have pinning defined by VDSM.

	Defined policy names are:

	- none: no policy defined, CPUs from shared pool will be used
	- pin: manual CPU pinning or NUMA auto-pinning policy
	- dedicated: each vCPU is pinned to single pCPU that cannot be used by
	  any other VM
	- siblings: like dedicated but physical cores used by the VM are blocked
	  from use by other VMs
	- isolate-threads: like siblings but only one vCPU can be assigned to
	  each physical core

	Related feature page for policies is:
	https://ovirt.org/develop/release-management/features/virt/dedicated-cpu.html

	Bug-Url: https://bugzilla.redhat.com/1782077

	virt: get real vCPU count instead of maximum
	The element value does not have to correspond to the real vCPU count. It
	is a maximum number of vCPUs a VM can have, but the actual count may be
	lower to make CPU hot-plugging possible. The actual count is specified
	in the (optional) attribute "current". Only if the attribute is not
	present the element value is the real count.

	The original implementation returned None in case there is was no <vcpu>
	element present. But missing <vcpu> suggests a broken domain XML so now
	we raise an exception instead.

2021-11-30  Harel Braha  <hbraha@redhat.com>

	vdsm: remove all uses of 'abrt'
	Bug-Url: https://bugzilla.redhat.com/2015093

2021-11-28  Nir Soffer  <nsoffer@redhat.com>

	tox: Use 4 pylint jobs by default
	Testing show 43% speedup with 4 jobs. Using more jobs does not help.

	With 2 jobs:

	PROFILE {
	    "command": [ "pylint", "-j2", ...],
	    "cpu": 199.93449966241778,
	    "elapsed": 28.412290573120117,
	    "maxrss": 436784,
	    "minflt": 255064,
	    "nivcsw": 269,
	    "nvcsw": 92237,
	    "stime": 1.432198,
	    "utime": 55.373773
	}

	With 4 jobs:

	PROFILE {
	    "command": ["pylint", "-j4", ...],
	    "cpu": 357.232159197985,
	    "elapsed": 19.80775475502014,
	    "majflt": 1,
	    "maxrss": 342928,
	    "minflt": 374020,
	    "nivcsw": 309,
	    "nvcsw": 4179,
	    "stime": 1.4095689999999998,
	    "utime": 69.350101
	}

2021-11-26  Marcin Sobczyk  <msobczyk@redhat.com>

	makefile: Fix the path to 'sitecustomize.py'
	In [1] we've moved 'sitecustomize.py' from 'usr/share' to 'libexec'
	directory, but we've never changed the path in this makefile.
	This makes pylint complain with:

	 ************* Module static/usr/share/vdsm/sitecustomize.py
	 static/usr/share/vdsm/sitecustomize.py:1:0: F0001: No module named static/usr/share/vdsm/sitecustomize.py (fatal)

	though it still returns 0. Let's use the new path.

	[1] I024e1dde80c6b9589c6e4a3c8c3416dbf98195c5

	containers: Add 'rpm-build'
	'rpm-build' is the only missing dependency that's preventing us from
	being able to use the containers for building RPMs. Let's add
	it to the image.

2021-11-22  Nir Soffer  <nsoffer@redhat.com>

	lvm: Remove read_only mode
	Read only mode is not useful since RHEL 8; remove it.

	Bug-Url: https://bugzilla.redhat.com/2025527

	tests: Don't test read only mode
	Read only mode is not useful since RHEL 8. Remove tests for read only
	mode or changes to read only mode during tests.

	Bug-Url: https://bugzilla.redhat.com/2025527

	lvm: Don't use global:locking_type
	This option is deprecated and useless since RHEL 8. We could not remove
	it in the past since we had to support older version of lvm on Fedora.

	Looks like this option became harmful in lvm2-2.03.14-1.el8.x86_64,
	converting locking_type=4 to --readonly. We see this failure in vdsm
	tests:

	    WARNING: locking_type (4) is deprecated, using --sysinit --readonly.
	    Operation prohibited while --readonly is set.
	    Can\'t get lock for b4512b9d-84dc-43ba-865d-32c4a1cd148a.

	and the lvm command fails.

	Converting locking_type=4 to --readonly does not look correct, so this
	is likely a regression in lvm. But we should not use locking_type in
	vdsm. This option is used only in tests using read only mode, which is
	never used in the real application.

	As a quick fix to unbreak the tests, remove the locking_type
	configuration. We need to remove the tests for read only mode and remove
	the entire read only mode feature later.

	Bug-Url: https://bugzilla.redhat.com/2025527

	automation: Log lvm2 version
	We need to know which lvm version is running in the CI environment.

	Related-to: https://bugzilla.redhat.com/2025527

2021-11-19  Nir Soffer  <nsoffer@redhat.com>

	virt: Remove vmdevices.storage.BlockInfo
	It was replaced by drivemonitor.BlockInfo, and no code is using it now.

	Bug-Url: https://bugzilla.redhat.com/1913387

	vm: Replace blockInfo with block stats
	Replace the call to blockInfo() with DriveMonitor.get_block_stats(),
	returning block info for all volumes. To extract the block stats for the
	drive active volume we need the index of the drive.

	We get block stats only if we have drives that should be extended, or
	when we pre-extend a drive when starting replication.

	getExtendInfo() was modified to amend block info from libvirt with
	information from the replica in case the drive is not chunked but
	replicating to a chunked replica. This method should be removed once we
	start using libvirt block stats for the replica drive.

	Bug-Url: https://bugzilla.redhat.com/1913387

2021-11-19  Milan Zamazal  <mzamazal@redhat.com>

	virt: Make formatting in Vm.maybe_kill_paused prettier
	Introducing a helper method for the condition is satisfying enough to
	all the reviewers.

2021-11-18  Nir Soffer  <nsoffer@redhat.com>

	vm: Split the query from loop
	Split the query for getting drives to extend from the loop extending the
	drives. In the next patch we want to get block stats from libvirt if we
	have drives to extend, and pass the result down to the code extending
	the drives.

	Bug-Url: https://bugzilla.redhat.com/1913387

	vm: Fix handling of ImprobableResizeRequestError
	The try block covered the entire loop, while it should cover only the
	call that may fail. If the error was raised, we return False even if
	some drives were extended, which is incorrect.

	Bug-Url: https://bugzilla.redhat.com/1913387

	tests: Remove unused fake blockInfo
	Since we stopped calling getExtendInfo() during live merge, we don't
	need to mock blockInfo().

	Bug-Url: https://bugzilla.redhat.com/1913387

	livemerge: Do not use getExtendInfo for base volume
	When extending base drive before live merge we used Vm.getExtendInfo()
	to get the current block info. This is wrong for several reasons:

	- The function returns block info for the drive active volume, instead
	  of the base volume for the merge, which is never the active volume.

	- The function will replace Drive.blockinfo with the new block info,
	  which is unwanted side affect when we try to extend the base volume.

	- It calls libvirt for no reason.

	Since we need the volume capacity, add the capacity to job.extend dict
	when starting a merge, so we don't need to get it from the storage API
	on each call to _start_extend().

	Bug-Url: https://bugzilla.redhat.com/1913387

	drivemonitor: Fetch block info using block stats API
	When extending volumes, we use libvirt.virDomain.blockInfo() to get
	volume allocation. This API is easy to use, but it does not work for
	backup scratch disks, volumes in the backing chain, or blockCopy
	destination volume.

	We want to replace usage of libvirt.virDomain.blockInfo() with
	libvirt.virConnect.domainListGetStats(), that works for all block nodes,
	include backup scratch disks[1] (since RHEL 8.6).

	Vdsm already collects block stats for sampling purpose, but we cannot
	used this code::

	- We need the allocation info at the time of the call, and sampling
	  collects values only every 15 seconds.
	- Sampling does not collect info for the backing chain (and should not)
	  but we must collect info for the backing chain.
	- Sampling collects all stats while we need only block stats.
	- Sampling collects stats for all VMs while we need only single VM.
	- Sampling skip non responsive VMs, while we don't skip blockInfo()
	  calls.
	- Drive monitoring cannot depend on sampling, a subsystem with different
	  requirements (best effort) and maintained by different teams.

	Using libvirt.virConnect.domainListGetStats() is tricky, since it
	requires a libvirt.virDomain object as parameter, and this object is
	wrapped by the VM._dom object. Since the VM owns the _dom object, it is
	natural to provide a method to get block stats in the VM object:
	VM.get_block_stats().

	Another problem with libvirt.virConnect.domainListGetStats() is the
	unhelpful return value, flat mapping of "block.N.KEY" to VALUE for all
	block nodes:

	    {
	        ...
	        "block.0.fl.times": 0,
	        "block.1.name": "sda",
	        "block.1.path": "/rhev/.../44d498a1-54a5-4371-8eda-02d839d7c840",
	        "block.1.backingIndex": 2,
	        "block.1.rd.reqs": 13448,
	        "block.1.rd.bytes": 415614976,
	        "block.1.rd.times": 9940902315,
	        "block.1.wr.reqs": 4909,
	        "block.1.wr.bytes": 82999296,
	        "block.1.wr.times": 47469574949,
	        "block.1.fl.reqs": 683,
	        "block.1.fl.times": 4204366339,
	        "block.1.allocation": 216006656,
	        "block.1.capacity": 6442450944,
	        "block.1.physical": 7113539584,
	        "block.1.threshold": 6576668672,
	        "block.2.name": "sda",
	        ...

	There is lot of unneeded information for monitoring context, and no
	easy way to extract the single value we actually need. A new method
	DriveMonitor.get_block_stats() added, extracting this info in a useable
	form:

	    {
	        2: drivemonitor.BlockInfo(
	            index=2,
	            name='sda',
	            path='/rhev/.../44d498a1-54a5-4371-8eda-02d839d7c840',
	            allocation=216006656,
	            capacity=6442450944,
	            physical=7113539584,
	            threshold=6576668672,
	        ),
	        ...
	    }

	We will use this in extend flows to fetch block info for all volumes
	when we try to extend drives.

	[1] https://bugzilla.redhat.com/2017928

	Bug-Url: https://bugzilla.redhat.com/1913387

	virdomain: Allow access to the underling libvirt.virDomain
	Some code needs access the underlying libvirt.virtDomain object. Make a
	public accessor so we don't access private attributes.

	The underlying virDomain is needed when calling
	libvirt.virConnect.domainListGetStats(), requiring list of virDomain
	objects. It should not be used for anything else.

	Bug-Url: https://bugzilla.redhat.com/1913387

	tests: Don't hide logs using vm logger
	FakeVm was using FakeLogger, which hides all the logs using the vm
	logger. The fake logger class should be used only for testing logging;
	when running tests we want to see real logs when a test fails.

	tests: Generate backup xml automatically
	When we start a backup, we need to parse libvirt backup xml to get the
	index for the backup scratch disk. This requires that we have a backup
	xml for every test running backup.start_backup(). We have 22
	invocations, so setting this manually is not the way.

	FakeDomainAdapter generates now the backup xml from the backup_xml
	argument to backupBegin(). Since we always have backup xml, we can
	verify that backup was started correctly by comparing the backup xml
	instead of keeping and comparing the input xml.

	Similar change is needed for verifying checkpoint xml.

	Bug-Url: https://bugzilla.redhat.com/1913387

	vm: Pass BlockInfo instead of unpacking it
	Vm.getExtendInfo returns a BlockInfo named tuple, but we unpacked it and
	sent multiple arguments. When we pass the BlockInfo to the drive
	montior, we send it instead of unpacking it. This simplifies the code
	and include more context when we log the named tuple.

	This will be even more useful when BlockInfo will return more
	information about the block node like backing index, drive name, and
	path.

	Bug-Url: https://bugzilla.redhat.com/1913387

	tests: Remove python 2 future imports
	Bug-Url: https://bugzilla.redhat.com/1913387

	tests: Use transientdisk.disk_path()
	When generating backup input xml, use transientdisk.disk_path(). The
	cleanest way to do this is to use temporary variables, and include them
	in the xml using fstring (introduced in python 3.6).

	Bug-Url: https://bugzilla.redhat.com/1913387

	transientdisk: Make owner_dir and disk_path public
	The module use private helpers to generate the transient disk owner
	directory and disk path. These paths are needed for testing backup, so
	we don't need to duplicate transient disk code.

	Bug-Url: https://bugzilla.redhat.com/1913387

	tests: Remove unneeded globals
	DOMAIN_ID was used only once when generating drives info.

	VOLUME_ID was used only once, to create drives with same volume id,
	which is invalid configuration.

	We create new uuid now instead of using a global.

	Bug-Url: https://bugzilla.redhat.com/1913387

	tests: Access drives via fake vm
	Pass the fake vm to all the helper function that need to access the
	drive list. This will make it possible to create different drive
	configuration, for example block based drives that need to be extended
	during backup.

	Now that the fake vm is available in the helper function, we can use its
	id attribute instead of duplicating it.

	Bug-Url: https://bugzilla.redhat.com/1913387

2021-11-18  Milan Zamazal  <mzamazal@redhat.com>

	virt: Derive VM kill timeout from io_timeout
	We have been using vm_kill_paused_time config option to specify the
	time after which we can kill VMs with "kill" resume behavior that are
	in paused state due to an I/O error.  If a user modifies sanlock
	timeout settings, the option must be adjusted accordingly.

	The sanlock I/O timeout is now configurable using sanlock.io_timeout
	option.  The VM killing timeout can be directly computed from it by
	multiplying it by 8 and should be no more taken from a different
	option.  This patch removes vm_kill_paused_time option and computes
	the corresponding value from io_timeout.

	Bug-Url: https://bugzilla.redhat.com/2010205

2021-11-18  Ales Musil  <amusil@redhat.com>

	net: Delegate privileged operations to supervdsm in DHCP monitoring
	The DHCP monitoring was recently switched to use netlink monitoring
	which makes the whole process simpler, however the was major oversight.
	The DHCP monitor runs in unprivileged context (vdsmd) this brings
	two issues:

	1) We cannot setup the source route rules because it needs to be done
	in root context (supervdsmd).
	2) The pool of items that are monitored is operated by supervdsmd
	and vdsmd couldn't see the items available. That resulted in skipping
	every valid opportunity to notify engine about new IP and creating
	the source route rules.

	In order to fix that the dhcp monitor is kept in vdsmd but the
	monitoring check and removal is delegated to supervdsmd. Same
	goes for the setup of source route rules.

2021-11-18  Marcin Sobczyk  <msobczyk@redhat.com>

	stdci: Run linters on psi
	We want to run the linters in PSI as well. Let's define this CI
	substage in the same way the others are.

2021-11-17  Harel Braha  <hbraha@redhat.com>

	New release: 4.50.0.2

2021-11-17  Nir Soffer  <nsoffer@redhat.com>

	automation: Split storage tests to its own job
	Storage tests takes more than 50% of the time (10+ min), and need more
	complicated setup and teardown. The legacy vdsm tests still use nose and
	the outdated pywatch timeout that needs python debuginfo.

	We also have timeouts issue in storage tests when running on the
	el9stream nodes. This is likely and issue on the server running these
	vms, but maybe we can run other tests on these nodes.

	Having two jobs running in parallel should speed up the tests, be more
	reliable, and improve coverage for el9stream.

2021-11-16  Marcin Sobczyk  <msobczyk@redhat.com>

	build: Add missing '-y' flag to dnf
	We're missing '-y' flag for dnf when installing vdsm build dependencies,
	which makes the dnf execution interactive and fails the CI pipeline on
	rhel8:

	 ...
	 [2021-11-16T10:00:26.857Z] Total download size: 19 k
	 [2021-11-16T10:00:26.857Z] Is this ok [y/N]: Operation aborted.

	Let's switch to the more recenct 'dnf builddep' instead of 'yum-builddep'
	on this occasion as well.

2021-11-15  Nir Soffer  <nsoffer@redhat.com>

	monitor: Do not tear down domain during shutdown
	During vdsm shutdown, we must keep storage live, since VMs or image
	transfers may use the storage domain. We had special check for the host
	id, keeping it alive during shutdown, but we were missing similar check
	for teardown.

	In the past StroageDoamin.teardown() was not effective, but during 4.4.
	we fixed it several times, and now it really teardown the storage
	domain, and shutting down vdsm deactivate entire volume groups and
	remove device mapper devices for logical volumes.

	When logical volumes are used, we see these errors during shutdown:

	2021-11-14 14:00:03,911+0200 INFO  (monitor/313e6d7) [storage.blocksd]
	Tearing down domain 313e6d78-80f7-41ab-883b-d1bddf77a5da (blockSD:996)

	2021-11-14 14:00:03,911+0200 DEBUG (monitor/313e6d7) [common.commands]
	/usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /usr/sbin/lvm vgchange
	--config 'devices ... --available n 313e6d78-80f7-41ab-883b-d1bddf77a5da
	(cwd None) (commands:154)

	2021-11-14 14:00:09,114+0200 DEBUG (monitor/313e6d7) [common.commands]
	FAILED: <err> = b'  Logical volume 313e6d78-80f7-41ab-883b-d1bddf77a5da/ids
	in use.\n  Can\'t deactivate volume group "313e6d78-80f7-41ab-883b-d1bddf77a5da"
	with 1 open logical volume(s)\n'; <rc> = 5 (commands:186)

	If we have logical volumes in use, tearing down the storage domain will
	leave them active, so running VMs and active image transfers are safe.

	However failed LVM commands are retried several times, which slow down
	the shutdown process, and shutting down is likely to time out.

	I think this may be related to the hosted engine local maintenance
	issue.

	Bug-Url: https://bugzilla.redhat.com/2023344
	Related-to: https://bugzilla.redhat.com/1986732

2021-11-15  Ales Musil  <amusil@redhat.com>

	net: Fix source route definition when missing
	Source route that was deleted or went somehow missing on
	the host could lead to unfixable out-of-sync status in engine.
	The code for source routes tried to optimize when we generate
	new source route state by checking if the gateway has changed between
	configuration. The consequence is that if the gateway did not
	change, the source route wouldn't be configured again. To prevent
	that stop checking if the gateway has changed between configs. This
	approach has been used for default route networks already.

	Bug-Url: https://bugzilla.redhat.com/2022354

2021-11-14  Nir Soffer  <nsoffer@redhat.com>

	tests: Remove unused import
	Seems that OST is broken - it merged a patch without CI+1, and now
	CI fails on master.

2021-11-13  Roman Bednar  <rbednar@redhat.com>

	livemerge: recover from failed pivot attempt
	During live merge we call syncVolumeChain multiple times to make sure
	the actual/libvirt chain is synced to current/vdsm chain.

	When pivot starts, the new requested chain is passed to syncVolumeChain
	which compares it to current vdsm chain. This way we can tell what
	volume is being removed.

	If the volume being removed is a leaf/active layer, it is flagged as
	ILLEGAL in vdsm to prevent usage.

	Then libvirt blockjob (abort) is started and if it fails the old code
	never recovered the volume from ILLEGAL state and manual intervention
	was required.

	This patch adds a helper to switch the leaf volume back to LEGAL and
	calls the helper if libvirt fails abort block job.

	Bug-Url: https://bugzilla.redhat.com/1949475

	image: allow leaf legality status recovery when syncing chain
	Sync volume chain needs to be able to recover top volume legality.
	This sync function can be used after libvirt failing pivot to make
	sure the top volume in vdsm chain is not left in ILLEGAL state.

	Bug-Url: https://bugzilla.redhat.com/1949475

	tests: add pivot test with unavailable storage
	tryPivot() can now raise when marking volume illegal if storage is not
	available.

	CleanupThread.run() handles this error and sets cleanup thread state to
	FAILED - this test should verify this behavior.

	Bug-Url: https://bugzilla.redhat.com/1949475

2021-11-12  Milan Zamazal  <mzamazal@redhat.com>

	virt: Add debugging log for guest event status
	It has been experienced that a VM, without a guest agent, got stuck in
	RebootInProgress status forever.  Let's add a debugging log statement
	to log the related times.

2021-11-10  Ales Musil  <amusil@redhat.com>

	net, tests: Fix network tests on PSI CI

2021-11-09  Vojtěch Juránek  <vjuranek@redhat.com>

	tests: add more test types how to login to iscsi target
	To better investigate iscsiadm behaviour, add more options how to add
	and login to the iscsi nodes:
	- run adding nodes and login to them in serial manner, this approach is
	  used in vdsm 4.4 and previous versions
	- add nodes serially and login to them concurrently
	- add nodes and login to them concurrently
	- add nodes serially and use `--loginall` to login to all nodes

	This should give us a hint if running vdsm add node flow concurrently
	is safe or not.

2021-11-08  Vojtěch Juránek  <vjuranek@redhat.com>

	tests: change default concurrency level
	After adding dedicated option for login-all method, there's no need to
	use concurrecy option as a branching criteria which method should be
	used for login.
	Also, in the following patch other option for running add and login in
	parallel will be added. There's no point in having default concurrency
	level set to zero. Change default concurrency level to 4.

2021-11-08  Ales Musil  <amusil@redhat.com>

	net, tests: Wait in port mirroring test
	Instead of sleep wait with timeout for 2 secs,
	that should be enough time to get the mirrored packet.

	init: Move vdsmd and supervdsmd to /usr/libexec
	/usr/share is not the right place for the service binaries,
	since both of them are not meant to be executed as separate
	binary, move them to /usr/libexec.

	From security point of view /usr/share is not a valid
	place for executable, default rpmdb parsing in fapolicyd
	causes issues with vdsm service in /usr/share.

	Bug-Url: https://bugzilla.redhat.com/2015802

2021-11-06  Nir Soffer  <nsoffer@redhat.com>

	virt: Improve comments
	- livemerge: Explain why we lie about the current volume size when
	  extending the base volume before starting commit.

	- drivemonitor: Add missing info about handling missing allocation when
	  checking if drive should be extended.

	Bug-Url: https://bugzilla.redhat.com/1913387

2021-11-04  Vojtěch Juránek  <vjuranek@redhat.com>

	tests: add dedicated option for login into all tagets
	Add an dedicated option for login into all tagets method. This is not
	current vdsm flow and also has its drawbacks, but probably can be used
	as an option to consider if cuncurrent login will turn out as not a
	useable/safe way how to log in into multiple targets.

	All add a docstring for this method, describing its drawbacks.

	tests: fix a typo in argument help text

	iscsi: configure node startup before login
	Currently node startup policy configuration is done  after login to
	the target, but if it fails, we are left with node default config,
	which is `automatic`. Configure iscsi node startup policy to `manual`
	as the first thing after adding the node.

2021-11-02  Roman Bednar  <rbednar@redhat.com>

	image: improve log messages for volume chain sync
	Fix indentation and improve log messages.

	Bug-Url: https://bugzilla.redhat.com/1949475

	livemerge: add helper for marking leaf volume illegal
	Adding a helper for changing volume legality improves code readability.

	Similar helper will be added in followup patch to mark leaf legal.

	Bug-Url: https://bugzilla.redhat.com/1949475

	livemerge: do not ignore errors when syncing volume chain
	Use a wrapper function vm.imageSyncVolumeChain() when trying to pivot
	which does not ignore errors.

	This is an actual fix for bug 2018947.

	Related-Bug: https://bugzilla.redhat.com/1949475
	Bug-Url: https://bugzilla.redhat.com/2018947

	vm: add imageSyncVolumeChain() wrapper
	The imageSyncVolumeChain() function available through client interface
	does not raise and returns a result dict instead.

	Each caller then would have to check this dict for status and code.
	This and can be improved by adding a wrapper that checks the code.

	Followup patches show how this wrapper is used.

	Related-Bug: https://bugzilla.redhat.com/1949475
	Bug-Url: https://bugzilla.redhat.com/2018947

2021-11-01  Filip Januska  <fjanuska@redhat.com>

	virt: Create "_timedDesktopLock()" thread with concurrent.thread
	After a user disconnects from a vm console, the "onDisconnect" method
	creates a new thread using threading.Timer class. This thread then
	executes "_timedDesktopLock" after a 2 second delay.

	Since all threads in vdsm should be created using the "concurrent"
	module, in order to provide some useful threading facilities
	(thread name, log exceptions on errors etc.), this patch replaces
	the Timer thread with a concurrent.thread. The 2 second delay is
	replaced by a time.sleep call.
	Since this thread only executes the "_timedDesktopLock" method and
	is destroyed right after, stopping the control here should be fine.

2021-10-29  Sandro Bonazzola  <sbonazzo@redhat.com>

	copr: enable copr builds

2021-10-29  Harel Braha  <hbraha@redhat.com>

	Change pickle protocol version
	The source files used during build time on Fedora 34 are
	in a cpickle format that is not backward compatible.
	As a quick workaround change the pickle protocol version
	that we use to '4', which should work for both Fedora 34 and el8.

2021-10-28  Harel Braha  <hbraha@redhat.com>

	Fix errors hidden in %posttrans scriptlet
	In vdsm %posttrans scriplet, all errors were hidden.
	As we used ">/dev/null 2>&1", stdout was bound to /dev/null.
	By redirecting stderr to stdout, all errors are now hidden
	as well as pointing to /dev/null.

	Bug-Url: https://bugzilla.redhat.com/2014865

2021-10-26  Ales Musil  <amusil@redhat.com>

	net, tests: Document new way of building images

	net, tests: Reorganize network containers
	Add makefile for easier building and distinguish
	CentOS release by tags instead of separate container
	name.

	net, tests: Add Dockerfiles for el9s containers
	In order to test everything also on CentOS Stream 9
	add Dockerfiles with el9s as base image.

2021-10-25  Tomáš Golembiovský  <tgolembi@redhat.com>

	qga: continue and process all VMs
	Commit a7d59f0b52 provided a fix in the logic but introduced a bug.
	Instead of stopping just processing of the current VM the return
	statement terminates the loop and leaves all the following VMs
	unprocessed. This is obviously wrong as we want to go through all VMs in
	every poller run.

	Bug-Url: https://bugzilla.redhat.com/1981946
	Backport-To: 4.4

2021-10-21  Liran Rotenberg  <lrotenbe@redhat.com>

	snapshot: fix the recovery mechanism
	The recovery mechanism was considered to be used only when the memory
	operation is executed as this might take long time and something might
	happen to VDSM. This might cause undesired behavior when we recover
	after non-memory snapshot, which executes fast in libvirt but might take
	time to freeze, or even if we missed libvirt execution of the snapshot,
	meaning we lost VDSM for the exact period of time of the execution.

	In such cases the snapshot might incorrectly finish successfully:
	- If we are in the middle of the domain job, we will catch it during the
	  recovery process since we check the domain job and report correctly.
	- If we are after the domain job, it will act as succeeded(not precise,
	  since the operation might have failed and we missed it).
	- If we are before the domain job, it will act wrong, marking it as
	  succeeded.

	In this patch we determine if the snapshot succeeded by checking the
	VM's domain XML. In it, we look for new volumes that were made for
	the snapshot job.

	Bug-Url: https://bugzilla.redhat.com/2012832

2021-10-20  Nir Soffer  <nsoffer@redhat.com>

	tests: Test extension when allocation is missing
	During backup, libvirt reports allocation=0. Add a test simulating this
	situation, and verifying that we extend the disk when getting a block
	threshold event.

	Related-to: https://bugzilla.redhat.com/2015121

2021-10-20  Roman Bednar  <rbednar@redhat.com>

	tests: livemerge: verify imageSyncVolumeChain arguments
	This test should help us better understand how volume chain sync works.

	Also it should verify if livemerge code is sending correct volume chain
	to this function.

	In current code, when imageSyncVolumeChain is called via tryPivot(),
	the input volume chain is the actual chain as reported by Drive but
	without leaf volume id.

	Bug-Url: https://bugzilla.redhat.com/1949475

2021-10-20  Ales Musil  <amusil@redhat.com>

	net, tests: Align common steps between contianers

2021-10-19  Nir Soffer  <nsoffer@redhat.com>

	zombiereaper: Reap out last traces of it
	Remove zombiereaper from vdsm and supervdsm. With this change the fd
	leak in supervdsm should be fixed.

	Bug-Url: https://bugzilla.redhat.com/1926589

2021-10-18  Nir Soffer  <nsoffer@redhat.com>

	drivemonitor: Fix extend during backup
	During backup, libvirt does not report the allocation for chunked drive.
	If we enable debug logs, we will see:

	    2021-10-18 02:12:48,025+0300 DEBUG (periodic/2) [virt.vm]
	    (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') Extension info for drive sdb
	    volume 624b1158-f2b9-4b4f-91c4-332e5594c8fb: BlockInfo(capacity=4294967296,
	    allocation=0, physical=3221225472) (vm:1277)

	We have the same issue before the guest writes to the drive, so
	DriveMonitor.should_extend_drive() does not complain and returns False
	for this case.

	The result is that during backup, when we get a block threshold event,
	or when the VM paused, we never extend the disk, so the VM cannot be
	resumed until the backup is stopped.

	Once backup is stopped, libvirt reports again the actual allocation, and
	the vdsm extends the disk and resume the VM.

	This issue is likely a bug in libvirt or qemu, but we can handle the
	missing allocation in a smarter way. If we don't have allocation info,
	but the drive block threshold was exceeded, we know that the guest wrote
	to the disk, and we can assume that the disk needs extension.

	Since this is not an excepted situation, we log new warning:

	    2021-10-18 02:12:48,025+0300 WARN  (periodic/2) [virt.vm]
	    (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') No allocation info for drive
	    sdb, but block threshold was exceeded - assuming that drive needs extension
	    (drivemonitor:253)

	We may silence this warning later when we are more confident about this
	situation, and after we resolve this issue with libvirt and qemu folks.

	Since this is urgent fix that we want to get in 4.4.9, a test for this
	scenario will be added later.

	Bug-Url: https://bugzilla.redhat.com/2015121

2021-10-17  Lev Veyde  <lveyde@redhat.com>

	tool: sanlock: Fixed upgrading old sanlock
	In commit 8fb0596b3ae6cf6befae4d9fb4d97cbaeea9c543
	    tool: sanlock: Validate max_worker_threads

	We tried to always validate "max_worker_threads" value, assuming that
	sanlock always report the value, since we require sanlock 3.8.3.
	However when upgrading older sanlock, running vdsm-tool after
	the upgrade will fail because sanlock does not report
	"max_worker_threads".

	Fix the validation so missing value is treated as incorrect
	configuration.

	Bug-Url: https://bugzilla.redhat.com/2013383

2021-10-16  Roman Bednar  <rbednar@redhat.com>

	tests: image: test internal and active volume chain sync
	Add tests that demonstrate how image.syncVolumeChain() works.

	There are two cases currently:

	1) if top/leaf volume is being removed from actual/libvirt chain it is
	set to ILLEGAL in current/vdsm chain

	2) if leaf is not being removed - meaning internal merge is ongoing -
	the chain has to shift parent volumes accordingly

	Bug-Url: https://bugzilla.redhat.com/1949470

2021-10-14  Nir Soffer  <nsoffer@redhat.com>

	v2v: Always wait for virt-v2v
	Previously we waited only 30 seconds after virt-v2v finished, or when
	aborting the process, and we pass the pids to zombiereaper. We can
	replace zombiereaper with commands.wait_async(), but I don't think we
	should keep live virt-v2v process accessing storage like that.

	This removes the last usage of zombiereaper from vdsm.

	Bug-Url: https://bugzilla.redhat.com/1926589

	supervdsm_server: Fix race when runAs times out
	When waiting for runAs child process, if the process disappears right
	after we checked if the process is alive, os.kill() can fail with
	ProcessLookupError, hiding the Timeout error from the caller.

	Fix by handling ProcessLookupError around os.kill(). Now handle the
	error and and raise the expected Timeout exception in the caller.

	Change the way we check if the process is alive according to
	multiprocessing docs[1]. After calling join(1), we can check the
	process exitcode instead of calling is_alive().

	[1] https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Process.join

	Bug-Url: https://bugzilla.redhat.com/1926589

	supervdsm_server: Use commands.wait_async()
	Use commands.wait_saync() instead of zombiereaper to wait for slow
	validateAccess after a timeout.

	This removes 1/2 uses of zombiereaper.

	Bug-Url: https://bugzilla.redhat.com/1926589

	supervdsm_server: Simplify error handling
	Error handling in _runAs was more complex than needed. Simplify the flow
	using try-finally, and minimize the usage of asynchronous reaping by
	terminating the child gracefully, and waiting for it one second before
	killing it and passing it to zombie reaper.

	We used to check if the process exists after sending a kill signal. This
	should never be needed if we check that the process is a live. The
	process cannot disappear before we join it.

	Bug-Url: https://bugzilla.redhat.com/1926589

	supervdsm_server: Remove safe_poll
	We used polling loop to protect from interrupted poll() in
	multiprocessing. This is not an issue in python 3, since interrupted
	system calls are restarted internally.

	Bug-Url: https://bugzilla.redhat.com/1926589

	supervdsm_server: Remove temporary variables
	Removing unneeded res, err temporary variables and improve formatting to
	make the code more clear.

	Bug-Url: https://bugzilla.redhat.com/1926589

	supervdsm_server: Simplify communication with child
	The current code is confusing; multiprocessing.Pipe() returns 2
	connections, not pipes. Both the parent and child read and write from
	both ends of the pipe, which seems pointless.

	Simplify the communication with the child by using non-duplex pipe. The
	child gets the write end and sends the result. The parent polls the read
	end and receives the result.

	Bug-Url: https://bugzilla.redhat.com/1926589

	iscsiadm: Use commands.wait_async()
	Use commands.wait_saync() instead of zombiereaper to wait for slow
	fc-scan after a timeout.

	This does not solve issue of starting a new scan before the first scan
	finished. The new scan is likely to block in the kernel until the
	previous scan completed.

	This command run with sudo, so it is not strictly required to fix the fd
	leak in supervdsm, but we can have the same issue in vdsm, so we want to
	remove all usage of zombiereaper.

	This removes 1/3 uses of zombiereaper.

	Bug-Url: https://bugzilla.redhat.com/1926589

	hba: Use commands.wait_async()
	Use commands.wait_saync() instead of zombiereaper to wait for slow
	fc-scan after a timeout.

	This does not solve issue of starting a new scan before the first scan
	finished. The new scan is likely to block in the kernel until the
	previous scan completed.

	This removes 1/4 uses of zombiereaper.

	Bug-Url: https://bugzilla.redhat.com/1926589

2021-10-14  Sandro Bonazzola  <sbonazzo@redhat.com>

	automation: add s390x builds

2021-10-14  Nir Soffer  <nsoffer@redhat.com>

	commands: Support asynchronous wait
	We have 4 cases when we return early before a child process was
	terminated, and let zombiereaper wait for the command. In some cases
	like storage refresh flow, there is a good reason not to block callers.
	Waiting on a stuck command will hold storage global lock that may block
	any storage flow on unrelated storage domains.

	The issue with zombiereaper is registering SIGCHLD signal. Once we do
	this every command run by vdsm or supervdsm will trigger a SIGCHLD
	signal, interrupting random code running the command terminated. Turns
	out that python with statement is not safe[1] and if a signal interrupts
	the code at the right time, __exit__ will not called. This can cause
	resources leaks (see linked bug) or other bad effects.

	This change introduces a simple mechanism for asynchronous waiting. When
	you want to return early from a command, you can do:

	    commands.wait_async(proc)

	This starts a background thread communicating with the process and
	return immediately, so you can return early without blocking. When the
	process terminates, we log the exit code, output and error.

	Here is example logs from the tests:

	19:13:09,440 INFO    (wait/860951) [common.commands] Waiting for process
	860951 (commands:202)

	19:13:09,542 INFO    (wait/860951) [common.commands] Process terminated
	with rc=0 out=b'out\n' err=b'err\n' (commands:204)

	[1] https://bugs.python.org/issue29988

	Bug-Url: https://bugzilla.redhat.com/1926589

2021-10-14  Sandro Bonazzola  <sbonazzo@redhat.com>

	automation: enable el9stream build jobs

2021-10-13  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.50.0.1

2021-10-12  Sandro Bonazzola  <sbonazzo@redhat.com>

	packaging: el9: replace genisoimage with xorriso
	See also: https://bugzilla.redhat.com/1971840

2021-10-12  Nir Soffer  <nsoffer@redhat.com>

	nbd: Add detect_zeroes option
	Detecting zeroes is best done on the client side, avoiding sending
	actual zeroes over the wire, but no client support this yet. Enabling
	zero detection on the server side works for all clients.

	NBDServerConfig supports now new "detect_zeroes" option. If set to True,
	qemu-nbd is started with the --detect-zeros option. If "detect_zeroes"
	is set to True, qemu-nbd detects zeroes in writes and convert them to
	optimized write zeroes command.

	Detecting zeroes has two modes:

	- If the "discard" option is set to True, we use --detect-zeroes=unmap.
	  This deallocates space in the volume and can improve performance by
	  not writing actual zeroes to storage.

	- If the "discard" option is set to False or not set, we use
	  --detect-zeroes=on.  This can improve performance by allocating space
	  without writing actual zeroes.

	Detecting zeroes adds little overhead, so writing fully allocated images
	(e.g. ISO images) is expected to be slightly slower.

	Previously we tested the discard option when testing uploads and
	downloads, but this option has no affect when uploading images using
	qemu-img convert. Now we test discard option only when detecting zeroes.

	Engine can always enable zero detection for keeping sparse volume sparse
	and improving performance.

	Bug-Url: https://bugzilla.redhat.com/1616436

2021-10-11  Liran Rotenberg  <lrotenbe@redhat.com>

	snapshot: refactor
	This patch rename the functions to private.

2021-10-11  Milan Zamazal  <mzamazal@redhat.com>

	virt: Remove obsolete recovery params
	`smp' and `memSize' parameters passed to a Vm instance are not used
	anymore.  Their retrieval was introduced in
	https://gerrit.ovirt.org/c/vdsm/+/81529 and the parameters were used
	in Host.getVMFullList API call response.  But Host.getVMFullList is
	not called anymore and Engine rather retrieves that data directly from
	the domain XML.

2021-10-07  Liran Rotenberg  <lrotenbe@redhat.com>

	configurators: switch to raw memory dump
	Using gzip on block devices can cause an error once libvirt tries to
	decompress the memory dump volume: "gzip: stdin: decompression OK,
	trailing garbage ignored". Which cause a failure in libvirt and cancel
	the operation. Using the old `lzop` compression gives bad performance.
	Hence, we will drop the compression for now, fixing the functionality
	on block storages and not harming the performance.

	Bug-Url: https://bugzilla.redhat.com/1978672

2021-10-06  Filip Januska  <fjanuska@redhat.com>

	build: Replace deprecated 'iterkeys()'
	A previous patch replaced all python2 shebangs with python3.
	This caused a bug with the smbios hook, which uses a python2
	way of iterating a dictionary - "d.iterkeys()".
	This patch replaces it with a python3 alternatve - "iter(d)"

	Bug-Url: https://bugzilla.redhat.com/2008431

2021-10-06  Liran Rotenberg  <lrotenbe@redhat.com>

	snapshot: introduce non-memory timeout
	Although the abort thread is irrelevant for the non-memory snapshot
	flow, we do have limitations. If the pivot did happen, we shouldn't
	abort it. But, if it happens before, it's most likely to get stuck on
	freeze operation. We can't abort the freeze and we get response only
	when the file system is frozen. For Windows VMs there is a limitation
	of 10 minutes file system freeze. With that, having a snapshot that
	took more than 10 minutes to be frozen, will make the data inconsistent.
	10 minutes will work fine if it's only one file system, so we can be
	sure we are frozen, but we go on the safe side, which is 8 minutes to
	get 2 minutes margin to execute the pivot and to have enough time to let
	the whole file system to be frozen. The API will respect other given
	value for the new timeout if override is desired.

	Bug-Url: https://bugzilla.redhat.com/1985973

2021-10-06  Ales Musil  <amusil@redhat.com>

	spec: Require unversioned vdsm for hooks
	To prevent upgrade issues require unversioned vdsm
	for hooks.

	Bug-Url: https://bugzilla.redhat.com/2004469

2021-10-05  Nir Soffer  <nsoffer@redhat.com>

	lvm: Remove fixed pylint comment
	In commit b13ef1beb8c58cd461dd4b817420c70c7a2de1f0

	    Pylint: fix E1120 (no-value-for-parameter)

	The missing value for parameter was fixed, but the pylint: comment was
	not removed.

2021-10-05  Vojtech Juranek  <vjuranek@redhat.com>

	virt: user proper disk type when changing CD
	So far we always use FILE disk type when changing CD. This is wrong and
	in some cases libvirt detect that and fails with

	    libvirt.libvirtError: internal error: unable to execute QEMU command
	    'blockdev-add': 'file' driver requires '/rhev/data-center/...' to be
	    a regular file

	Use appropriate disk type setup by prepareVolumePath() call.

	Add tests for file and block based storage. Network disk doesn't need to
	be handled in the code and tested as network disks are not supported as
	CDROM devices.

	Bug-Url: https://bugzilla.redhat.com/1990268

2021-10-05  Vojtěch Juránek  <vjuranek@redhat.com>

	spec: bump libvirt version
	Bump libvirt version to get fix for BZ #2003644. This issue causes
	CD change failure when original CD has defined startupPolicy and we
	try to switch to block-based CD, where startupPolicy is not supported.

	As required libvirt version (7.6.0-4) is not available on CentOS stream
	yet, require this version only on RHEL for now.

	Bug-Url: https://bugzilla.redhat.com/1990268
	Related-To: https://bugzilla.redhat.com/2003644

2021-10-05  hbraha  <hbraha@redhat.com>

	Pylint: fix E1120 (no-value-for-parameter)
	Pylint version update[1] revealed an E1120 (no-value-for-parameter)-
	the exception constructor did not receive all required parameters.
	We don't really reach the point where we run an LVM command, and the
	exception seems to describe a scenario when incorrect tags are passed.
	Therefore, the 'ValueError' exception seems logically appropriate.
	Line 1409 contains a similar issue which is handled in another path[2].

	[1] https://gerrit.ovirt.org/#/c/vdsm/+/116871/
	[2] https://gerrit.ovirt.org/#/c/vdsm/+/116258

2021-10-05  Marcin Sobczyk  <msobczyk@redhat.com>

	Upgrade pylint version
	It appears that internal pylint errors [1] such as:
	"internal error with sending report for module
	['lib/vdsm/storage/lvm.py'] 'Name' object has no attribute 'value'"

	prevents Pylint from working properly and reports errors in some
	modules. pylint bug reported for the issue[2].
	bumping up pylint version seems to overcome this error.

	Several true positives errors reported in the new pylint version,
	such as:
	"lib/vdsm/storage/lvm.py:1409:14: E1120: No value for argument 'rc' in
	constructor call (no-value-for-parameter) "

	Additionally, false negatives errors are also reported:
	"lib/vdsm/storage/volumemetadata.py:101:29: E1102: validate is not
	callable (not-callable)"

	in order to pass the build and upgrade the pylint version,
	the errors have been skipped. the true positive errors will
	 be handle in the next patch.

	[1] https://jenkins.ovirt.org/blue/organizations/jenkins/
	vdsm_standard-check-patch/detail/vdsm_standard-check-patch/29942/
	pipeline/151#step-264-log-1372

	[2] https://github.com/PyCQA/pylint/issues/5113

2021-10-04  Roman Bednar  <rbednar@redhat.com>

	lvm: use from_lvmerror() helper for all remaining lvm exceptions
	The exception helper from_lvmerror() is already used with other flows
	and helps with reraising lvm errors while keeping the original
	traceback.

	Traceback example in this patch:
	https://gerrit.ovirt.org/c/vdsm/+/116228

	Bug-Url: https://bugzilla.redhat.com/1536880

	lvm: use run_command() in refreshLVs() flow
	Modify refreshLVs() flow to use new run_command() which now raises
	LVMCommandError.

	The exceptions raised in this flow now inherit from LVMCommandError and
	provide more details for better debugging.

	Bug-Url: https://bugzilla.redhat.com/1536880

	lvm: use run_command() in renameLV() flow
	Modify renameLV() flow to use new run_command() which now raises
	LVMCommandError.

	The exceptions raised in this flow now inherit from LVMCommandError and
	provide more details for better debugging.

	Bug-Url: https://bugzilla.redhat.com/1536880

	lvm: use run_command() in reduceLV() and extendLV() flows
	Modify reduceLV() and extendLV() flows to use new run_command() which
	now raises LVMCommandError.

	The exceptions raised in this flow now inherit from LVMCommandError and
	provide more details for better debugging.

	Example traceback:

	Traceback (most recent call last):
	  File "/vdsm/tests/storage/lvm_test.py", line 283, in test_extendlv_failure_cache
	    lvm.extendLV(fake_vg.name, fake_lv.name, 100)
	  File "/vdsm/lib/vdsm/storage/lvm.py", line 1736, in extendLV
	    raise se.LogicalVolumeExtendError.from_lvmerror(e)
	  File "/vdsm/lib/vdsm/storage/lvm.py", line 1712, in extendLV
	    _lvminfo.run_command(cmd, devices=_lvminfo._getVGDevs((vgName,)))
	  File "/vdsm/lib/vdsm/storage/lvm.py", line 561, in run_command
	    raise error
	  File "/vdsm/lib/vdsm/storage/lvm.py", line 536, in run_command
	    return self._runner.run(full_cmd)
	  File "/vdsm/lib/vdsm/storage/lvm.py", line 356, in run
	    raise se.LVMCommandError(cmd, rc, out, err)
	vdsm.storage.exception.LogicalVolumeExtendError: Logical Volume extend failed: 'cmd=[\'/usr/sbin/lvm\', \'lvextend\', \'--config\', \'devices {  preferred_names=["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3  filter=["a|^/dev/mapper/a$|^/dev/mapper/b$|", "r|.*|"]  hints="none"  obtain_device_list_from_udev=0 } global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0  use_lvmpolld=1 } backup {  retain_min=50  retain_days=0 }\', \'--autobackup\', \'n\', \'--size\', \'100m\', \'vg/lv\'] rc=5 out=[] err=[]'

	Bug-Url: https://bugzilla.redhat.com/1536880

	lvm: use run_command() in createLV() flow
	Modify createLV() flow to use new run_command() which now raises
	LVMCommandError.

	The exceptions raised in this flow now inherit from LVMCommandError and
	provide more details for better debugging.

	Bug-Url: https://bugzilla.redhat.com/1536880

2021-10-01  Marcin Sobczyk  <msobczyk@redhat.com>

	el9s: spec: Require 'crontabs'
	We have an hourly cron job distributed with vdsm [1], so we should
	require 'crontabs' package.

	This was found with 'rpmlint' in el9s, reported as
	'missing-dependency-to-cron'. The change in the spec doesn't fix the
	'rpmlint' run though, as the check in the shipped version  (1.11 at the
	moment of writing) is broken.  It requires a dependency to 'cron'
	package, but the usual name for the package providing cron functionality
	is 'cronie'. Furthermore, the dependency should actually be on
	'crontabs', which is fixed in newer versions of 'rpmlint' [2].

	[1] https://github.com/oVirt/vdsm/blob/b4bbb2df896d5ed15f4bc0c5ca4adff0335535e4/vdsm.spec.in#L870
	[2] https://github.com/rpm-software-management/rpmlint/pull/404

	el9s: spec: Simplify requirement for 'nfs-utils'
	We're way past some CentOS/RHEL 7 versions, so let's
	use an unversioned requirement for 'nfs-utils'.

	el9s: spec: Unify 'gluster_version'
	There's no point in keeping these conditionals if we use the same
	version everywhere.

	el9s: spec: Simplify some of the requirements
	We're way past 'kernel' 3.10 and some CentOS 7 version
	of 'selinux-policy-targeted'. Same thing goes for the required
	Fedora versions of the packages.

	Let's keep the kernel version on par with CentOS/RHEL 8.4
	and get rid of the other outdated requirements.

	el9s: spec: Simplify requirement for qemu-kvm
	We don't support distros other then rhel-derivatives >= 8.
	Let's remove the unnecessary if condition.

2021-09-30  Eyal Shenitzky  <eshenitz@redhat.com>

	qemuimg: use '--skip-broken-bitmap' in qemu-img convert
	If a bitmap becomes inconsistent after abnormal vm termination,
	converting the volume with the inconsistent bitmap using the
	'--bitmaps' option will fail with the following error:

	qemu-img: Failed to populate bitmap 5f59b2d6-6b52-484c-ae7a-f8b43f2175a4:
	Bitmap \'5f59b2d6-6b52-484c-ae7a-f8b43f2175a4\' is inconsistent and cannot
	be used\nTry block-dirty-bitmap-remove to delete this bitmap from disk"

	This failure will fail cold/live disk copy.

	In qemu-6.0.0 a new '--skip-broken-bitmap' flag was introduced to skip those
	broken bitmaps and avoid failing the convert operation.

	Also, a new test was added for converting the volume with inconsistent
	bitmap.

	Bug-Url: https://bugzilla.redhat.com/1984852

	spec: update qemu-kvm requirement
	Bump qemu-kvm version to consume the new '--skip-broken-bitmaps' flag.

	Bug-Url: https://bugzilla.redhat.com/1984852

2021-09-29  Nir Soffer  <nsoffer@redhat.com>

	guestagenthellpers_test: Test timestamp parsing
	We tested only the case when the guest agent returns an iso date string.
	Add the missing test for integer timestamp.

2021-09-27  Milan Zamazal  <mzamazal@redhat.com>

	virt: Fix guest agent conversion of the driver date
	Driver date returned from the guest agent represented by nanoseconds
	doesn't have a time zone attached.  It doesn't make sense to return
	different driver dates for different host time zones.  Let's interpret
	the driver nanoseconds as belonging to UTC.

	Let's also divide the nanoseconds by an integer rather than a float to
	avoid possible floating issues.

	New release: 4.50.0

2021-09-27  Marcin Sobczyk  <msobczyk@redhat.com>

	el9s: spec: Move comment
	'rpmlint' on el9s is picky and complains about "extra tokens
	at the end of %else directive". Let's move the comment below.

	el9s: spec: Use versioned obsoletes
	'rpmbuild' on el9s seems to be picky and complains:

	 specfile-error warning: line 294: It's not recommended to have unversioned Obsoletes: Obsoletes: vdsm-hook-vfio-mdev
	 specfile-error warning: line 296: It's not recommended to have unversioned Obsoletes: Obsoletes: vdsm-hook-floppy
	 ...

	Let's simply use obsoletes for versions earlier than
	the current one.

	pthreading: Remove monkeypatching
	We used to need to monkeypatch the 'pthreading' module
	in py2, but we don't support it anymore - let's get rid of it.

	testlib: Get rid of 'mock' compatibility layer
	We don't support py2 anymore so we should always import from 'unittest'.

	compat: Remove 'enum' from compatibility layer
	We don't need 'enum' in the 'compat' module anymore.

	compat: Remove 'pickle' compatibility layer
	We don't support py2 anymore so we don't need to keep
	'pickle' in our 'compat' module.

2021-09-27  Sandro Bonazzola  <sbonazzo@redhat.com>

	pep8: fix E402
	fixed errors E402 (https://www.flake8rules.com/rules/E402.html)
	detected by flake8.
	When the error was due to compatibility layer for python2, dropped the
	hack as we doens't build anymore for distributions using python2.

	Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=2004412

2021-09-23  Ales Musil  <amusil@redhat.com>

	net: Explicitly enforce MAC on linux bridge
	Kernel usually assigns MAC address of first port
	to newly created bridge, however this behavior can
	be changes by systemd option "MACAddressPolicy".
	To prevent any issues when the default option changes
	explicitly enforce MAC address to the bridge when possible.

	Bug-Url: https://bugzilla.redhat.com/2005213

	net, ovs: Move MAC enforce code
	Move the MAC enforce code to more appropriate
	place and extract the mac getter into CurrentState.
	This allows it to be later reused by linux bridge.

	Bug-Url: https://bugzilla.redhat.com/2005213

2021-09-21  Roman Bednar  <rbednar@redhat.com>

	lvm: use run_command() in reduceVG() flow
	Modify reduceVG() flow to use new run_command() which now raises
	LVMCommandError.

	The exceptions raised in this flow now inherit from LVMCommandError and
	provide more details for better debugging.

	Also provide a new from_lvmerror() classmethod which will simplify
	reraising lvm errors (used in next patches).

	Traceback example:

	Traceback (most recent call last):
	  File "/vdsm/tests/storage/lvm_test.py", line 421, in test_reducevg_failure_cache
	    lvm.reduceVG(fake_vg.name, fake_pv2.name)
	  File "/vdsm/lib/vdsm/storage/lvm.py", line 1511, in reduceVG
	    raise se.VolumeGroupReduceError.from_lvmerror(e)
	  File "/vdsm/lib/vdsm/storage/lvm.py", line 1509, in reduceVG
	    _lvminfo.run_command(cmd, devices=_lvminfo._getVGDevs((vgName, )))
	  File "/vdsm/lib/vdsm/storage/lvm.py", line 561, in run_command
	    raise error
	  File "/vdsm/lib/vdsm/storage/lvm.py", line 536, in run_command
	    return self._runner.run(full_cmd)
	  File "/vdsm/lib/vdsm/storage/lvm.py", line 356, in run
	    raise se.LVMCommandError(cmd, rc, out, err)
	vdsm.storage.exception.VolumeGroupReduceError: Cannot reduce the Volume Group: 'cmd=[\'/usr/sbin/lvm\', \'vgreduce\', \'--config\', \'devices {  preferred_names=["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3  filter=["a|^/dev/mapper/a$|^/dev/mapper/b$|", "r|.*|"]  hints="none"  obtain_device_list_from_udev=0 } global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0  use_lvmpolld=1 } backup {  retain_min=50  retain_days=0 }\', \'vg\', \'/dev/mapper/pv2\'] rc=5 out=[] err=[]'

	Bug-Url: https://bugzilla.redhat.com/1536880

2021-09-20  Sandro Bonazzola  <sbonazzo@redhat.com>

	automation: update repos for 4.5 / el8stream

2021-09-20  Ales Musil  <amusil@redhat.com>

	net, tests: Skip unicode bridge name test on el9s
	The unicode bridge test is failing on CentOS Stream 9,
	xfail it for now.

	net, tests: Fix the chroot error message check
	The message on el8s is "Running in chroot, ignoring request",
	and on el9s "Running in chroot, ignoring command". To support
	both shorten the message just to "Running in chroot" which
	should be good enough indication.

2021-09-19  Eyal Shenitzky  <eshenitz@redhat.com>

	qemuio.py: Introduce abort() helper to crash QEMU process
	Introduce the new abort() helper to simulate crash of the QEMU process.
	It will open the given image for writing and kill the process.

2021-09-17  Sandro Bonazzola  <sbonazzo@redhat.com>

	automation: add el9stream repo config
	add repositories for el9stream testing

2021-09-17  Marcin Sobczyk  <msobczyk@redhat.com>

	py39: gluster: Remove usage of deprecated 'Element.getchildren' method
	'xml.etree.ElementTree.Element.getchildren' method is deprecated [1].
	One should simply iterate over the element.

	[1] https://docs.python.org/3.6/library/xml.etree.elementtree.html?highlight=getchildren#xml.etree.ElementTree.Element.getchildren

2021-09-16  Ales Musil  <amusil@redhat.com>

	net: Add ovnConfigured to capabilities
	Add ovnConfigured boolean that indicates if
	OVN is properly configured on the host.

	Bug-Url: https://bugzilla.redhat.com/1940824

	net: Add check if OVN is properly configured on the host
	Check if required services are running namely:
	- ovn-controller
	- ovsdb-server
	- openvswitch-ipsec

	Check if OVN certificates do exist with proper permissions.
	Lastly check if OvS system-id corresponds to at least one DNS
	name in the SAN extension of the OVN certificate.

	Bug-Url: https://bugzilla.redhat.com/1940824

2021-09-16  Marcin Sobczyk  <msobczyk@redhat.com>

	el9stream: tests: lib: Adapt to changed 'SC_NPROCESSORS_ONLN' behaviour
	'TestExecCmdAffinity' tests are here because we want to be sure that
	even though we pin the vdsm process to a single CPU, other spawned
	commands should run on all available CPUs by default.

	The test functions are decorated with '@forked' so we can use 'taskset'
	on the process executing the test itself without affecting other tests.
	In 'testResetAffinityWhenConfigured' first we pin the test process
	to a single CPU and then execute a separate 'sleep' process to see if it
	runs on all CPUs.

	The problem is the comparison:

	 self.assertEqual(taskset.get(proc.pid), online_cpus())

	which _dynamically_ retrieves the number of the active processors
	for the current process with:

	 os.sysconf('SC_NPROCESSORS_ONLN')

	If we just pinned our process to a single CPU, why would we expect it to
	return all the available CPUs? Weird, but it seems it indeed worked that
	way... until el9stream, where the behavior is changed (and probably more
	reasonable).

	One can test that with interactive python console on both el8 and
	el9stream:

	 Python 3.x.x ...
	   >>> import os
	   >>> os.getpid()
	   129348
	   >>> os.sysconf('SC_NPROCESSORS_ONLN')
	   8

	then in a separate console limit the process's CPUs:

	 $ taskset -c -p 0 129348
	 pid 129348's current affinity list: 0-7
	 pid 129348's new affinity list: 0

	and try in the python console again:

	   >>> os.sysconf('SC_NPROCESSORS_ONLN')

	which will print out '8' in el8 and '1' in el9stream.

	The fix in our case seems simple - let's save the real CPU
	count before forking so we get consistent results.

	py39: tests: lib: Fix test class behavior
	In 'TestTerminating' test class we override the 'Popen.{poll,kill,wait}'
	methods on 'self.proc' instance with our own in test set up. We then
	use the saved, original 'self.proc_{poll,kill,wait}' versions in test
	cleanup.

	In py39 the 'send_signal' function called from 'Popen.kill' calls
	'Popen.poll' [1], which calls our "fake" method and exposes the flaw
	in our reasoning - instead of what we're doing right now, we should
	restore the original methods on the instance itself.

	[1] https://github.com/python/cpython/blob/4ce55cceb2901c564962f724448a9ced00c8a738/Lib/subprocess.py#L2058

2021-09-16  Sandro Bonazzola  <sbonazzo@redhat.com>

	flake8: fix E741
	Fix E741 ambiguous variable name 'l'
	and re-enable testing for it

	Related-To: https://bugzilla.redhat.com/show_bug.cgi?id=2004412

	flake8: fix E305
	Fixed E305: expected 2 blank lines after class or function definition, found 1
	and re-enabled the test for it in tox.

	Related-To: https://bugzilla.redhat.com/show_bug.cgi?id=2004412

	tox: update flake8 to 3.9.2
	3.5 is failing on el9/python 3.9

2021-09-15  Vojtěch Juránek  <vjuranek@redhat.com>

	flake8: fix various flake8 issues
	When upgrading to flake 3.9.2, couple of new issues appeared:

	    25    E117 over-indented
	    1     F507 '...' % ... has 1 placeholder(s) but 2 substitution(s)
	    1     F632 use ==/!= to compare constant literals (str, bytes, int, float, tuple)
	    2     F811 redefinition of unused 'key_cert_pair' from line 36
	    3     F841 local variable 'e' is assigned to but never used
	    5     W605 invalid escape sequence '\*'

	Fix all these issue, so we can upgrade flake without adding any new
	exceptions.

	Bug-Url: https://bugzilla.redhat.com/2004412

2021-09-14  Nir Soffer  <nsoffer@redhat.com>

	fileUtils: Fix error handling and logging in tarCopy
	When fileUtils.tarCopy() failed, we get unhelpful error:

	    vdsm.storage.fileUtils.TarCopyFailed: (1, 0, b'', b'')

	This tuple (1, 0, b'', b'') means that the reader child process failed
	with rc=1, and the writer child process terminated normally with rc=0.
	However, the rest of the arguments are the empty stdout and stderr of
	the successful writer child process. The useful stderr of the failing
	reader child process is dropped.

	Fix error handling so we collect stderr from both reader and writer
	child processes, and on error we raise a detailed exception all failing
	child processes.

	Add debug logs when starting the commands and when both child processes
	terminate, so we can find code writing to the master filesystem after
	the reader was started.

	Add info log since copying the master filesystem modifies the target,
	and we log as system state changes in info level.

	On errors we raise now a new public error instead of a private error, so
	the detailed error will propagate to engine log.

	Here is an example error when both reader and writer child
	processes failed (reformatted to shorten long lines):

	vdsm.storage.exception.TarCommandError: Tar command failed: (
	{'reader': {'cmd': ['/usr/bin/tar', 'cf', '-', '-C', '/src', '.'],
	'rc': 2, 'err': '/usr/bin/tar: /src: Cannot open: No such file or directory\n'
	'/usr/bin/tar: Error is not recoverable: exiting now\n'},
	'writer': {'cmd': ['/usr/bin/tar', 'xf', '-', '-C', '/dst', '--touch'],
	'rc': 2, 'err': '/usr/bin/tar: This does not look like a tar archive\n'
	'/usr/bin/tar: Exiting with failure status due to previous errors\n'}},)

	Bug-Url: https://bugzilla.redhat.com/1913764

2021-09-14  Roman Bednar  <rbednar@redhat.com>

	tests: better coverage for lvm exception handling
	Adds tests for multiple flows with focus on exception handling.

	Bug-Url: https://bugzilla.redhat.com/1536880

	tests: add fake device helpers
	Add helpers for creating fake devices.

	Bug-Url: https://bugzilla.redhat.com/1536880

2021-09-12  Nir Soffer  <nsoffer@redhat.com>

	resourceManager: Support with statement
	resourceManager.Lock can be used now as a context manager. This
	simplifies locking a single resource during a function.

	_ResourceManager.acquireResource() returns a ResourceRef which can be
	used as a context manager, but nobody understands that horribly complex
	code, and I don't trust it.

	Bug-Url: https://bugzilla.redhat.com/1913764

	resourceManager: Switch to module logger
	This module was using logger per class, using non-standard logger names.
	This is too complicated and unneeded. Change class loggers to use module
	logger with standard name ("storage.resourcemanager").

	Some class logger create a logger adapter to include  more context in
	the logs. These logs were not changed but they also log to the module
	logger.

	Bug-Url: https://bugzilla.redhat.com/1913764

2021-09-10  Nir Soffer  <nsoffer@redhat.com>

	multipath: Fix typos in docstring

	task: Unify resourceManager import
	We always import resourceManager as rm, but for some reason we did not
	rename the import int the task module. Unify the import for consistency
	and avoiding long lines.

	Bug-Url: https://bugzilla.redhat.com/1913764

	sp: Move POOL_MASTER_DOMAIN to constants
	We need to use this constants from task module. Moving it to constants
	helps to minimize dependencies on sp module, and import cycles.

	Bug-Url: https://bugzilla.redhat.com/1913764

	storage: Lowercase logger names
	All loggers should have lower case names to make it easier to grep the
	logs. All the new modules are using lower case names in the last 7
	years. Now all loggers are lowercase.

	We still have too many loggers and too deep logger hierarchy; more work
	is needed to fix this.

	Revert "hsm: add a wait after connection to iscsi storage domain"
	This reverts commit f6e17d579059d861b97ea2a0ca63aade08aa4946.

	multipath.rescan() waits now until multipathd is ready, so we don't need
	the hack to wait 5 seconds after connecting to iSCSI targets. This
	improve the user experience, minimizing the wait when creating or
	managing an iSCSI storage domain.

	This change also mitigates the engine bug[1] calling connectStoargePool
	before connectStorageServer completed, due the way vdsm lock calls to
	StorageDomainCache.refreshStorage().

	[1] https://bugzilla.redhat.com/1993531

	Related-To: https://bugzilla.redhat.com/1807050
	Related-To: https://bugzilla.redhat.com/2000720

	multipath: Wait until multiapthd is ready
	We used to wait until "udevadm settle" returned, hoping that when all
	udev events were processed, new multipath devices are created. However
	is seems that this is not effective now, or maybe it never worked but we
	discovered it only now.

	Here is an example showing the issue.

	In vdsm log we see HBA scan, as part of Host.getDeviceList() call:

	2021-08-27 14:08:33,285-0500 INFO  (jsonrpc/7) [storage.HBA] Scanning FC devices
	2021-08-27 14:08:33,779-0500 INFO  (jsonrpc/7) [storage.HBA] Scanning FC devices: 0.49 seconds

	In the same second, we see lot of events from kernel about detecting and
	adding scsi devices in /var/log/messages:

	Aug 27 14:08:33 kernel: scsi 1:0:0:243: Direct-Access PURE FlashArray 8888 PQ: 0 ANSI: 6
	Aug 27 14:08:33 kernel: scsi 4:0:0:243: Direct-Access PURE FlashArray 8888 PQ: 0 ANSI: 6
	Aug 27 14:08:33 kernel: scsi 1:0:0:243: alua: supports implicit TPGS
	Aug 27 14:08:33 kernel: scsi 1:0:0:243: alua: device naa.624a9370131099fcace44d860002de4a ...
	Aug 27 14:08:33 kernel: sd 1:0:0:243: Attached scsi generic sg68 type 0
	Aug 27 14:08:33 kernel: sd 1:0:0:243: [sdbo] 10737418240 512-byte logical blocks: (5.50 TB/5.00 TiB)
	...
	Aug 27 14:08:33 kernel: sd 4:0:2:254: alua: port group 01 state A non-preferred supports tolUSNA
	Aug 27 14:08:33 kernel: sd 4:0:3:247: alua: port group 00 state A non-preferred supports tolUSNA
	Aug 27 14:08:33 kernel: sd 1:0:1:249: alua: port group 01 state A non-preferred supports tolUSNA

	Then we see events from multipathd about loading tables for new devices:

	Aug 27 14:08:34 multipathd[3936]: 3624a9370131099fcace44d860002de4a: load table [0 10737418240 multipath
	1 queue_if_no_path 1 alua 1 1 service-time 0 1 1 68:32 1]
	...
	Aug 27 14:08:34 multipathd[3936]: sdbo [68:32]: path added to devmap
	3624a9370131099fcace44d860002de4a
	...

	And vdsm completes the Host.getDeviceList() calls and return a response:

	2021-08-27 14:08:34,579-0500 INFO  (jsonrpc/7) [vdsm.api] FINISH getDeviceList return=...

	This response includes only device 3624a9370131099fcace44d860002de4a,
	with one path: "sdbo":

	    {
	      "GUID": "3624a9370131099fcace44d860002de4a",
	      "pathstatus": [
	        {
	          "physdev": "sdbo",
	          "state": "active",
	          "capacity": "5497558138880",
	          "lun": "243",
	          "type": "FCP"
	        }
	      ],
	      ...
	    },

	In the next 2 seconds, we see more events from multipathd about paths
	added to 3624a9370131099fcace44d860002de4a, and other devices we know
	were added to the hosts at that time:

	Aug 27 14:08:35 multipathd[3936]: 3624a9370131099fcace44d860002de49: load table [0 10737418240 multipath
	1 queue_if_no_path 1 alua 1 1 service-time 0 1 1 68:64 1]
	...
	Aug 27 14:08:35 multipathd[3936]: sdbq [68:64]: path added to devmap 3624a9370131099fcace44d860002de49
	...
	Aug 27 14:08:35 multipathd[3936]: 3624a9370131099fcace44d860002de48: load table [0 10737418240 multipath
	1 queue_if_no_path 1 alua 1 1 service-time 0 1 1 68:96 1]
	...
	Aug 27 14:08:35 multipathd[3936]: sdbs [68:96]: path added to devmap 3624a9370131099fcace44d860002de48
	...
	Aug 27 14:08:35 multipathd[3936]: 3624a9370131099fcace44d860002de47: load table [0 10737418240
	multipath 1 queue_if_no_path 1 alua 1 1 service-time 0 1 1 68:112 1]
	...
	Aug 27 14:08:35 multipathd[3936]: sdbt [68:112]: path added to devmap 3624a9370131099fcace44d860002de47
	...
	Aug 27 14:08:35 multipathd[3936]: 3624a9370131099fcace44d860002de4a: load table [0 10737418240 multipath
	1 queue_if_no_path 1 alua 1 1 service-time 0 8 1 68:32 1 68:48 1 68:160 1 68:176 1 69:32 1 69:80 1
	69:128 1 69:224 1]
	...
	Aug 27 14:08:36 multipathd[3936]: 3624a9370131099fcace44d860002de49: load table [0 10737418240 multipath
	1 queue_if_no_path 1 alua 1 1 service-time 0 8 1 68:64 1 68:80 1 68:192 1 68:208 1 69:48 1 69:112 1
	69:144 1 69:240 1]
	...
	Aug 27 14:08:36 multipathd[3936]: 3624a9370131099fcace44d860002de48: load table [0 10737418240 multipath
	1 queue_if_no_path 1 alua 1 1 service-time 0 8 1 68:96 1 68:128 1 69:0 1 68:224 1 69:64 1 69:160 1
	69:192 1 70:0 1]
	...
	Aug 27 14:08:36 multipathd[3936]: 3624a9370131099fcace44d860002de47: load table [0 10737418240 multipath
	1 queue_if_no_path 1 alua 1 1 service-time 0 8 1 68:112 1 68:144 1 68:240 1 69:16 1 69:96 1 69:176 1
	69:208 1 70:16 1]

	We don't have a good way to wait until multipathd handle new SCSI
	devices since we don't know which devices will be discovered by a SCSI
	rescan. But we can use "multipath show status"[1].

	"multiapthd show status" reports "busy: True" when it has uevents to
	process, and "busy: False" otherwise, for example:

	    # multipathd show status
	    path checker states:
	    up                  20

	    paths: 20
	    busy: False

	Unfortunately "multipath show status" almost never reports "busy" True",
	but it also blocks while processing events, so using it for polling is
	better than sleeping.

	After "udevadm settle" returns, we start polling multipathd, and
	terminate the wait only after 2 consecutive "busy: False" reports, or if
	we timed out.

	With this change, we wait after rescan about 2 seconds if multipathd
	never report "busy: True", and about 3 seconds if multipathd reports
	"busy: True" in the first check. While testing importing storage domain
	with 4 LUNs we saw about 200 waits with average wait time of 2.21
	seconds

	To change the maximum wait time (default 10 seconds), administrators can
	install a drop-in configuration file like this on all hosts:

	    $ cat /etc/vdsm/vdsm.conf.d/99-local.conf
	    [multipath]
	    # Wait more time until multiapthd is ready after scanning SCSI
	    # devices or connecting to new storage server.
	    wait_timeout = 20

	[1] https://bugzilla.redhat.com/1807050#c4

	Bug-Url: https://bugzilla.redhat.com/2000720

2021-09-10  Vojtěch Juránek  <vjuranek@redhat.com>

	tests: add support for storage type to fake IRS
	To be able to test wit various domain storage types, add domain types
	into fake IRS. This allows to pretend we run test on block storage
	for specified SD, while the actual backend still uses common file
	storage.

	storage: don't update device mapper size on SPM during LV extend
	When the disk is extended during VM migration, data corruption would
	happen if the destination host doesn't update the size of the disk (see
	BZ #1883399). To avoid this situation, we refresh the disk on the
	destination host first to ensure disk size is correct. If the
	destination host doesn't support disk refresh, we stop the disk monitor
	which should prevent any further disk extension and also abort disk
	refresh on the source host. VM eventually will be paused due
	to running out of the disk space.

	However, this is not complete solution and the data corruption can
	happen in this specific case: when the destination host doesn't
	support disk refresh, source host is SPM and disk is extended only
	once. Disk extension happens on SPM and lvm updates device mapper when
	running LV extend command, so SPM actually sees the extension even if we
	don't run LV refresh on the host. Disk monitoring is disabled, so no
	further extensions will happen, but the first one already has happened
	and is visible by the source host (SPM). If VM writes to this extended
	area until migration finishes, the disk corruption will happen, as this
	area is not visible on the destination host.

	To prevent this, add additional argument "--driverloaded n" to lvm
	extend command. This will cause that the extension will not be visible
	to the system (SPM host) and thus cannot be used by the VM.
	The extension will be visible only when we refresh the disks from vdsm,
	as this was an ordinary host.

	Bug-Url: https://bugzilla.redhat.com/1983882

	tests: unify tmp device size in lvm tests
	Temporary device size differs in lvm for no good reason. Some tests use
	10 GiB while others 20 GiB. Unify the size to 10 GiB in all lvm tests.

2021-09-09  Roman Bednar  <rbednar@redhat.com>

	lvm: make function for removing dm devices private
	No reason to have inner function, move this out to make it private in
	the module and move to a more appropriate place.

	Bug-Url: https://bugzilla.redhat.com/1536880

	lvm: raise in _createpv()
	We need to allow raising in _createpv() so callers get access to
	LVMCommandException containing meaningful debug info.

	Needed in followup patches.

	Bug-Url: https://bugzilla.redhat.com/1536880

	devicemapper: allow raising while removing mappings
	This change grants callers access to cmdutils.Error from run() which is
	needed in followup patches.

	Bug-Url: https://bugzilla.redhat.com/1536880

	exception: add a helper for re-rasing with traceback
	Current exceptions in lvm module do not show full traceback as
	"raise ... from None" is used to prevent repeating errors.

	Adding a helper to VdsmException that clears __cause__ and uses current
	traceback in with_traceback() to provide a meaningful traceback.

	This is not specific to lvm.py module and can be used anywhere.

	OLD:
	Traceback (most recent call last):
	  File "/vdsm/tests/storage/lvm_test.py", line 277, in test_failed_removevg_cache
	    lvm.removeVG("vg")
	  File "/vdsm/lib/vdsm/storage/lvm.py", line 1462, in removeVG
	    raise se.VolumeGroupRemoveError(e.cmd, e.rc, e.out, e.err) from None
	vdsm.storage.exception.VolumeGroupRemoveError: Volume Group remove error: "cmd=fakecmd rc=5 out=['out'] err=['err']"

	NEW:
	Traceback (most recent call last):
	  File "/vdsm/tests/storage/lvm_test.py", line 277, in test_failed_removevg_cache
	    lvm.removeVG("vg")
	  File "/vdsm/lib/vdsm/storage/lvm.py", line 1462, in removeVG
	    raise se.VolumeGroupRemoveError(e.cmd, e.rc, e.out, e.err).with_exception(e)
	  File "/vdsm/lib/vdsm/common/exception.py", line 47, in with_exception
	    raise self.with_traceback(exc.__traceback__)
	  File "/vdsm/lib/vdsm/storage/lvm.py", line 1457, in removeVG
	    _lvminfo.run_command(cmd, devices=_lvminfo._getVGDevs((vgName, )))
	  File "/vdsm/lib/vdsm/storage/lvm.py", line 564, in run_command
	    raise error
	  File "/vdsm/lib/vdsm/storage/lvm.py", line 539, in run_command
	    return self._runner.run(full_cmd)
	  File "/vdsm/lib/vdsm/storage/lvm.py", line 348, in run
	    rc, out, err = self._run_command(cmd)
	  File "/vdsm/tests/storage/lvm_test.py", line 199, in _run_command
	    raise se.LVMCommandError("fakecmd", 5, ["out"], ["err"])
	vdsm.storage.exception.VolumeGroupRemoveError: Volume Group remove error: "cmd=fakecmd rc=5 out=['out'] err=['err']"

	Bug-Url: https://bugzilla.redhat.com/1536880

2021-09-07  Ales Musil  <amusil@redhat.com>

	spec: Bump version of ovirt-openvswitch
	To avoid confusion require ovirt-openvswitch that is
	providing correct version of OvS and OVN for oVirt project.

2021-09-07  Sandro Bonazzola  <sbonazzo@redhat.com>

	automation: fixed yamllint errors

2021-09-06  Ales Musil  <amusil@redhat.com>

	net, automation: Remove outdated systemd setup from container
	Podman sets up systemd automatically if the container
	entry point is init or systemd.

2021-08-31  Liran Rotenberg  <lrotenbe@redhat.com>

	snapshot: don't invoke abort to non-memory
	The abort mechanism is mostly for the period we get stuck on the pivot
	(libvirt snapshot call). Without memory, the operation don't take long
	time to execute, as there is no memory dump. If we get into the abort
	timeout with no memory snapshot, it means we got stuck on other
	operations such as freezing or thawing. We can't abort those operations.
	Therefore, the abort doesn't bring any benefit in the no memory snapshot
	flow and removed for this flow.

	Bug-Url: https://bugzilla.redhat.com/1985973

2021-08-31  Eyal Shenitzky  <eshenitz@redhat.com>

	spec: update qemu-kvm requirement
	Bump qemu-kvm version to fix paused VM while writing
	on resized disk.

	Bug-Url: https://bugzilla.redhat.com/1996602

2021-08-30  Ales Musil  <amusil@redhat.com>

	net: Add retry to dumping of bond options
	With newer kernel (>=4.18.0-338) there seems to be a race within
	bonding driver. The bond created via sysfs hangs in state where
	the device exists but the sysfs infrastructure for it is
	missing:

	ls: cannot access '/sys/devices/virtual/net/bond_test': No such file or directory
	d??????????  ? ?    ?                               0 ?            ? bond_test

	That causes the vdsm-tool dump-bonding-options to fail on
	opening the sysfs bond mode file which should be always present.

	To prevent that add retry that checks if the bond mode sysfs
	file is valid, if not remove the bond and create it again.
	This makes the driver to be responsive again.

	Bug-Url: https://bugzilla.redhat.com/1998865

2021-08-30  Milan Zamazal  <mzamazal@redhat.com>

	New version: 4.50

2021-08-27  Marcin Sobczyk  <msobczyk@redhat.com>

	osinfo: Remove unnecessary '.decode' calls
	'rpm' library for a long time shipped a hack that added dummy
	'decode' method on strings as a compatiblity layer for codebases
	migrating from py2 to py3. This hack seems to be removed in the
	latest versions, so we should adapt the code to the fact that
	the returned values are already strings.

2021-08-17  Nir Soffer  <nsoffer@redhat.com>

	ci: Use el8 hosts
	Lot of tests using qemu-img time out when running on Centos Stream 9
	nodes. Ehud suggested to use host-distro: same to avoid these hosts, but
	there are no el8stream hosts:
	https://jenkins.ovirt.org/label/el8stream/

	But we have el8 hosts:
	https://jenkins.ovirt.org/label/el8/

	Change host-distro to el8 to run on the available hosts.  We will
	probably change this later to el8stream when we will have such hosts in
	the CI.

	tests: Always use backingFormat
	When using "qemu-img create" or "qemu-img rebase" with a backing file,
	we must always specify the backing format. Using backing file without
	specifying the format is insecure and was deprecated long time ago. This
	became a hard error in qemu-img 6.1.

	In the real code we always specify the backing format, but in the tests
	we skipped this in some places. Fix the tests and test infrastructure to
	always specify the backing format.

	tests: Improve failing tests
	test_wrong_backing_file and test_unexpected_backing_file time out in the
	CI in:

	    File "tests/storage/hsm_test.py", line 136, in test_wrong_backingfile
	       op.run()
	    File "lib/vdsm/storage/operation.py", line 82, in run
	       out, err = self._proc.communicate()
	    File "/usr/lib64/python3.6/subprocess.py", line 863, in communicate
	       stdout, stderr = self._communicate(input, endtime, timeout)
	    File "/usr/lib64/python3.6/subprocess.py", line 1534, in _communicate
	       ready = selector.select(timeout)
	    File "/usr/lib64/python3.6/selectors.py", line 376, in select
	       fd_event_list = self._poll.poll(timeout)

	The tests time out when running qemuimg.rebase(unsafe=True). This is a
	fast metadata operation that takes no time on a real system.

	I don't see any reason for this timeout, but we try to rebase on an
	empty file, and we don't specify the image and backing file format. In
	this case qemu has to probe the image format and the backing file
	format, and maybe there is a bug in qemu-img when rebasing on an empty
	image.

	Change the test to rebase on a non-empty raw image, and specify the
	image and backing file format properly.

	tests: Use pytest.timeout instead of py-watch
	py-watch has many issues:

	1. Working in the test suite level, which needs endless maintenance when
	   adding new tests and making the test suite slower.

	2. Because of 1, it does not react quickly enough when a test deadlocks.
	   We have to wait 10 minutes (20 in storage tests) until we time out.

	3. It requires installing python debuginfo packages, which slows down
	   test setup.

	4. Even when debuginfo package are installed, we usually do not get a
	   backtrace anyway.

	5. This is another infrastructure we invented that nobody else is using,
	   and we have to maintain it.

	Instead of pywatch, use pytest-timeout plugin, maintained by the python
	community. This works in the test level, so we can fail quickly when a
	test deadlocks, and no maintenance is needed when adding new tests. We
	always get a python backtrace when the test process is killed.

	The disadvantage is that we don't get a backtrace of the underlying
	python C code. But in the last 8 years we needed it only once or twice.
	For these cases we can manually run specific tests via gdb.

	py-watch is still use by the legacy nose tests, so we don't delete it
	and the required debuginfo install step.

2021-08-16  Shani Leviim  <sleviim@redhat.com>

	nbd: changing timeout on starting nbd server
	Usually, qemu-nbd starts in 20 milliseconds, so one second seems good
	timeout, btu in case of some overloaded environments (like ost)
	one second not always enought, so the nbd server reaches to a timeout.

	This patch changes the timeout to 10.0 seconds.

	Bug-Url: https://bugzilla.redhat.com/1993085

2021-08-16  Milan Zamazal  <mzamazal@redhat.com>

	New version: 4.40.90

2021-08-11  Nir Soffer  <nsoffer@redhat.com>

	contrib: Allow reusing existing target directory
	Add --exists flag to allow creating new target using existing directory.
	a typical use case is to add a disk with the target data to another VM.

2021-08-11  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.80.5

2021-08-11  Tomáš Golembiovský  <tgolembi@redhat.com>

	qga: refresh guest disk info after disk hotplug
	Our normal refresh interval for filesystem and disk information is
	5 minutes. This means that after disk hotplug it can take up to
	5 minutes before we reflect the guest side changes. This is not really
	user friendly and users expect to see the information sooner.

	To make sure the information is available sooner we store time of disk
	hotplug and perform re-check on next poller run that follows after
	hotplug + 10 seconds.

	Bug-Url: https://bugzilla.redhat.com/1967413

2021-08-10  Vojtech Juranek  <vjuranek@redhat.com>

	storage: remove unsed arguments from HSM.createStoragePool()
	Arguments `poolType` and `lockPolicy` were unsed in
	HSM.createStoragePool(). Remove these arguments.

	storage: remove lockPolicy argument from HSM.reconstructMaster()
	Argument `lockPolicy` is not used in HSM.reconstructMaster() method.
	Remove this argument.

	storage: remove spUUID argumet from task related HSM methods
	HSM methods related to tasks doesn't need spUUID argument. It's not used
	in any call and also vdsm API specification doesn't use it. Remove this
	argument.

	storage: fix HSM.discoverSendTargets() doc string
	Remove question marks and provide more details in the doc string.
	Corresponding vdsm API definition is
	ISCSIConnection.discoverSendTargets.

	storage: removed unsed HSM method
	HSM.cleanupUnusedConnections() doesn't do anything and isn't used
	anywhere. Remove it.

	storage: remote options argument from HSM methods
	There are many HSM methods where `options` argument is not used and not
	defined by vdsm API (vdsm-api.yml). Remove this argument.

2021-08-09  Eyal Shenitzky  <eshenitz@redhat.com>

	spec: update qemu-kvm requirement
	Bump qemu-kvm version to fix deadlock when rebooting the guest
	during backup operation.

	Bug-Url: https://bugzilla.redhat.com/1892681

2021-08-05  Vojtech Juranek  <vjuranek@redhat.com>

	storage: remove unsed option parameter
	Remove unused `option` parameter from extendVolume() method.
	This parmeter is unsed and is not defined in vdsm-api.yml.

	storage: remove unused isShuttingDown parameter
	Remove unused `isShuttingDown` parameter from extendVolume() method.

	virt: skip VM resume in cases when it's not supported
	After disk extend we always call Vm.cont() to eventually resume VM if
	it was paused due to ENOSPC before disk extend was finished. However,
	VM can be in a state when resume is not supported and it's status
	shouldn't be changed (e.g. VM is being migrated or is shutting down).
	Check the VM status before calling resume and skip it in these cases.

	Bug-Url: https://bugzilla.redhat.com/1981079

	virt: move condition when VM can be resumed into dedicated method
	This method will be re-used later for check if Vm.cont() can be safely
	called.

	virt: don't log error during disk extension when VM is not running
	When disk extension is requested, setting new disk threshold can fail
	as VM may not be running by that time. E.g. in case if VM migration this
	can happen when disk extension is requested before migration is finished
	but migration finishes before disk extension - in this case source VM is
	already shut down. See stack trace bellow to make it more clear (note:
	the stack trace is not present in the log, it was added by modifying
	test code, only error message is present in the logs).

	Don't log error when libvirt domain is already shut down, as it's
	expected disk threshold cannot be set in such case. In this case it
	should be safe to ingnore libvirt VIR_ERR_OPERATION_INVALID, see
	https://listman.redhat.com/archives/libvirt-users/2021-August/msg00002.html

	Example trace:

	2021-07-14 15:58:46,438-0400 ERROR (mailbox-hsm/3) [virt.vm] (vmId='2dad9038-3e3a-4b5e-8d20-b0da37d9ef79') Failed to set block threshold for drive 'sdc' (): Requested operation is not valid: domain is not running (drivemonitor:134)
	Traceback (most recent call last):
	  File "/usr/lib/python3.6/site-packages/vdsm/virt/drivemonitor.py", line 122, in set_threshold
	    self._vm._dom.setBlockThreshold(drive.name, threshold)
	  File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 101, in f
	    ret = attr(*args, **kwargs)
	  File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper
	    ret = f(*args, **kwargs)
	  File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 94, in wrapper
	    return func(inst, *args, **kwargs)
	  File "/usr/lib64/python3.6/site-packages/libvirt.py", line 2609, in setBlockThreshold
	    raise libvirtError('virDomainSetBlockThreshold() failed')
	libvirt.libvirtError: Requested operation is not valid: domain is not running

	Bug-Url: https://bugzilla.redhat.com/1981079

2021-08-04  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.80.4

2021-08-03  Tomáš Golembiovský  <tgolembi@redhat.com>

	numa: use list of strings instead in namedtuple definition
	This is purely cosmetic change to be consistent with the surrounding
	definitions. The result is functionally the same.

	virt: report CPU topology
	Report detailed information about CPU topology. XML with libvirt
	capabilities contains topology information in following form:

	    <topology>
	        <cells num='2'>
	          <cell id='0'>
	            <cpus num='4'>
	              <cpu id='0' socket_id='0' die_id='0' core_id='0' siblings='0-1'/>
	              ...
	            </cpus>
	          </cell>
	          ...
	        </cells>
	    </topology

	We parse the content so that we can pass it to Engine.

	We now query the libvirt capabilities on every Host.getCapabilities
	call. Because the results of numa._numa() are cached and the cache is
	never cleared this may lead to VDSM consuming and never freeing a lot of
	memory in the long run. This issue will be addressed in a separate
	patch.

2021-08-02  Liran Rotenberg  <lrotenbe@redhat.com>

	snapshot: don't consider metadata without recovery
	When we execute the snapshot we set details of the snapshot job into the
	VM metadata. Unfortunetly, we read this metadata regarless the flow we
	are in. That means, if we failed in executing a snapshot, we will get
	the metadata from the VM when doing another snapshot.

	This patch let us consume the snapshot metadata from the VM only on
	recovery mode.

	Bug-Url: https://bugzilla.redhat.com/1984209

2021-07-29  Roman Bednar  <rbednar@redhat.com>

	lvm: use run_command() in resizePV() flow
	Modify resizePV() flow to use new run_command() which now raises
	LVMCommandError.

	The exceptions raised in this flow now inherit from LVMCommandError and
	provide more details for better debugging.

	Bug-Url: https://bugzilla.redhat.com/1536880

	lvm: use run_command() in changelv() flow
	Modify changelv() flow to use new run_command() which now raises
	LVMCommandError.

	The exceptions raised in this flow now inherit from LVMCommandError and
	provide more details for better debugging.

	Bug-Url: https://bugzilla.redhat.com/1536880

	lvm: use run_command() in movePV() flow
	Modify movePV() flow to use new run_command() which now raises
	LVMCommandError.

	The exceptions raised in this flow now inherit from LVMCommandError and
	provide more details for better debugging.

	Bug-Url: https://bugzilla.redhat.com/1536880

	lvm: use run_command() in removeVG() flow
	Modify removeVG() flow to use new run_command() which now raises
	LVMCommandError.

	The exceptions raised in this flow now inherit from LVMCommandError and
	provide more details for better debugging.

	Bug-Url: https://bugzilla.redhat.com/1536880

	exceptions: increase consistency for parameter name
	Use more consistent naming for LVMCommandError exception. Renaming
	command parameter to cmd (see cmdutils.Error)

2021-07-29  Nir Soffer  <nsoffer@redhat.com>

	tests: Convert test class to functions
	Using functions simplify the code and make the test output easier to
	work with.

	tests: Remove empty TestReplication class
	This class had a long comment about required replication tests, and no
	tests. It looks like was a preparation for adding replication tests that
	were never added.

	The base class for the tests include a check function that seems to
	check replication, so the comments about replication may be stale now.

	Remove the empty class and keep the comments in the module docstring for
	later inspection.

	Now that we have only one test class, there is no need for the base
	class, so merge both into the same class.

	tests: Convert drive_extension_test to pytest
	Convert the test to pytest, replacing context manager with fixture and
	permutations with parameters.

	tests: Simplify fake vm creation
	Previously we used a context manager to create the fake vm since we need
	to monkeypatch Drive class attributes during the test, and creating test
	vm was complicated and required modifying multiple objects.

	Simplify by moving the complicated setup code into FakeVM.__init__().
	Now we can create a vm easily sing list of drive_conf, block_info
	tuples.

	Move vm creation info the test, so the context manager handles only
	Drive monkey-patching.

	tests: Improve parameters formatting
	Format the parameters properly to make the test context easy to
	understand. Add a helper for creating block_info dict to avoid repeating
	the same values and make the parameters self explaining.

	tests: Use drive config index
	Instead of generating index dynamically, use the index value in the
	drive conf dict. This will make it easier to simplify the way we create
	test vm in the next patches.

2021-07-28  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.80.3

2021-07-27  Saif Abu Saleh  <sabusale@redhat.com>

	api: virt: remove memAvailable and memCommited from vdsm
	This patch removes memAvailable and memCommited from HostStats
	model, and from host API, since they are not needed anymore
	by the engine in all supported versions:
	memCommitted is not consumed from vdsm by engine > 4.1
	engine > 4.1 can handle the case the deprecated memAvailable
	is not reported

	Bug-Url: https://bugzilla.redhat.com/1757689

2021-07-23  Nir Soffer  <nsoffer@redhat.com>

	tests: Clean up drive extension test setup
	Replace make_env(), returning vm, dom, drives, with make_vm() returning
	a vm ready to test. This is a preparation for replacing the context
	manager with a pytest fixture.

	We can get the drives from the vm using getDiskDevices() and access the
	dom using vm._dom. Accessing the private _dom is ugly but this fine for
	the test.

	tests: Minor cleanup in drive_extension_test
	Move setting drive size out of make_drive() so it only create a drive.
	Setting the drive size can be done after creating a drive.

	Change vmfakelib.FakeClientIF() to access an IRS() object so there is no
	need to override the default one after it is created.

	tests: Import constants from storage module
	Usually we import only modules to make it easier to understand where
	names are coming from. In tests context it is more helpful to import the
	constants so we don't have to repeat the module name.

2021-07-22  Roman Bednar  <rbednar@redhat.com>

	add helper script for running storage test env
	Getting a container test environment is currently not straight forward.

	Add a helper script which starts a container from existing image used
	in CI, similar to what travis uses.

	Currently the root permissions are unfortunately required in order to
	set correct labels on /run/udev, runnning as regular user fails with:

	Error: failed to set file label on /run/udev: operation not permitted

	Tested with: podman-3.0.1-7.module+el8.4.0+11311+9da8acfb.x86_64

2021-07-22  Nir Soffer  <nsoffer@redhat.com>

	tests: Inline constant used once
	_MONITORED_DRIVES was used only by one test, so we can inline it in the
	test pytest.mark.parameterize() call. This keeps the data near the test
	which help to understand the test.

	tests: Use functions instead of class
	Convert the test class to simple functions. This simplify the tests
	removing unhelpful `self` argument, decrease indentation, and make the
	test name shorter.

	Now the tests are short enough to paste in a commit message. This is the
	longest test name:

	    virt/drivemonitor_test.py::test_monitored_drives[replicate_network_to_block] PASSED

	tests: Merge monitored_drives tests
	We had 2 tests for DriveMonitor.monitored_drives(), each using list of
	parameters, calling the same _check() helper. Merge the list of test
	cases and merge the tests and the helper to a single test.

	tests: Convert drivemonitor test to pytest
	Convert test parameters to pytest.param() so we can use an id instead of
	comment to describe the test case. This makes the test names nicer:

	    $ tox -e virt tests/virt/drivemonitor_test.py -- -v
	    ...
	    test_monitored_drives[non_chunk_drives] PASSED

	It also makes it possible to select tests by test id:

	    $ tox -e virt tests/virt/drivemonitor_test.py -- -k _disabled
	    ...
	    test_monitored_drives_flag_disabled[both_drives_enabled] PASSED
	    test_monitored_drives_flag_disabled[first_drive_disabled] PASSED
	    test_monitored_drives_flag_disabled[second_drive_disabled] PASSED
	    test_monitored_drives_flag_disabled[both_drives_disabled] PASSED

	Note that I trimmed the real output from pytest, since the long test
	names do not play well with gerrit UI. The next patch will improve this
	issue.

	Reformat the drive configuration properly to make it much more readable.

	We used VdsmTestCase only for its logger, but but we can the fake vm
	logger.

	tests: Remove unneeded helper
	Now that we have only one configuration to test (using block threshold
	events), we don't need to monkeypatch anything, so there is no need to
	complicate the tests with make_env() context manager.

2021-07-21  Marcin Sobczyk  <msobczyk@redhat.com>

	New release: 4.40.80.2

2021-07-21  Ales Musil  <amusil@redhat.com>

	net: Replace dispatcher dhcp monitor with netlink monitor
	nmstate allows us to define directly in which route table will
	be new dynamic route defined. As consequence we no longer need
	to monitor the route itself, but we can just monitor when IP
	address of DHCP is bound to interface. This is used for
	engine notification and at the same time for IPv4 source route
	rules, because the route itself is added to the correct table by
	NM.

	net, tests: Move parametrize_ip_families into nettestlib

	net: Remove ip route and rule module
	This module is not used anymore after migrating
	source routing to nmstate.

	net: Remove sourceroute module
	Source routing is now handled by nmstate vdsm module, we can
	remove the old source route module.

	net: Use nmstate for dynamic source routing
	Dynamic source routing is done via auto route table id, which
	ensures that the default route from DHCP will be set in
	corresponding table. The only thing vdsm has to ensure is addition
	of route rules. This is done via dhcp monitoring.

	Bug-Url: https://bugzilla.redhat.com/1962563

	net: Move next_hop_interface helper to NetworkConfig class

	net: Use nmstate for static source routing
	Start using nmstate for static source routing setup.

	Bug-Url: https://bugzilla.redhat.com/1962563

2021-07-21  Lev Veyde  <lveyde@redhat.com>

	tool: sebool: Show warning if SELinux is disabled
	Previously the vdsm-tool was silently ignoring the SELinux related
	configuration steps, in case it thought that SELinux is disabled.
	That also made it harder to debug.

	This patch adds a warning message, letting the user know about the
	disabled SELinux.

2021-07-21  Nir Soffer  <nsoffer@redhat.com>

	drivemonitor: Always use block threshold events
	When we added support for block threshold events in 2017, we made it
	configurable so we can disable the feature if we find an issue in
	production. After using this for 4 years, we are pretty confident in
	block threshold events, and we are pretty sure that disabling them does
	not really work since nobody is testing this.

	Removing the configuration simplifies the code and the tests, removing
	many tests verifying the old behavior, and simplifying the test helpers.

	This change only removes the configuration, additional cleanup will
	follow.

	https://bugzilla.redhat.com/1913387

	tests: Minor cleanups in drivemonitor test
	Preparing to convert the test to pytest:
	- Simplify make_env()
	- Add whitespace to make the code more clear
	- Add comments for test permutations

2021-07-19  Lev Veyde  <lveyde@redhat.com>

	tool: lvm: Fix small typo

2021-07-19  Nir Soffer  <nsoffer@redhat.com>

	vm: Always use index for setting threshold
	Always use the indexed name when setting block threshold. This does not
	eliminate the double block threshold events[1], but we can drop the
	event for the drive name safely.

	Update the tests to verify that we always register events for indexed
	name.

	Update the tests to trigger the double events like libvirt and verify
	that we ignore the event with the drive name and handle the event for
	the indexed name.

	[1] https://bugzilla.redhat.com/1983429

	Bug-Url: https://bugzilla.redhat.com/1948177

	vm: Move up handling of NotConnectedError
	If a VM is not running when calling DriveMonitor.set_threshold(), we
	swallowed the error. This makes it harder to avoid bogus errors and
	warning in the rest of the code.

	We want to call Vm._drive_get_actual_volume_chain() in
	Vm._update_drive_volume_size(). This call may fail in the same way when
	the VM is not running when self._dom is virdomain.Defined() or
	virdomain.Disconnected(). We can handle NotConnectedError around this
	call, but handing this error everywhere is not attractive solution.

	Handling expected conditions like non-running VM should be done at the
	top level, since we don't have anything to do except logging a debug
	message.

	Change DriveMonitor.set_threshold() to handle any error and unset the
	drive threshold_state, but raise the error so callers can handle it.
	This is more correct and removes the dependency on the virdomain module.

	Handle NotConnectedError in Vm.refresh_disk() around the call to update
	drive volume size, so we don't fail the call during migration.

	Handle NotConnectedError in Vm.after_volume_extension(), so we abort the
	extend completion flow once we detect that the VM is not running. This
	also skip the call to resume the VM, which log other bogus warnings.

	Bug-Url: https://bugzilla.redhat.com/1948177

	tests: Add missing state for extending drives
	We want to always use the index when setting block threshold, but the
	current test rig does not build enough fake state to do this.

	To use index we need to find the drive actual volume chain, using drive
	alias. So we need to set drive.alias when creating a drive.

	To lookup drive by alias, we need to have the right disk xml in the fake
	domain. This xml must also have the index and path.

	To find drive volume id, we need drive volumeChain. We generated a
	partial volume chain with enough info to satisfy the code using it.

	Drive.index is not related to libvirt index reported in the xml, but for
	simplicity we use the same value, since we already abuse the index for
	creating unique path.

	To manage disk xml, the fake domain keeps now the devices xml element.
	This allows easy modification of libvirt xml for adding and removing
	drives or simulating changes in libvirt xml.

	Bug-Url: https://bugzilla.redhat.com/1948177
	Bug-Url: https://bugzilla.redhat.com/1913387

	vm: Extract _drive_volume_index()
	We want to use indexed name for both setting and clearing block
	threshold. Extract the code to lookup the index for drive volume from
	clear_threshold() to a helper so we can use it when setting the
	threshold.

	Bug-Url: https://bugzilla.redhat.com/1948177

	drivemonitor: Allow setting threshold with index
	When setting threshold for a normal drive, we always use the top layer,
	so we register threshold for the drive name (vda). In historic libvirt
	versions, there was no other way to refer to the top layer, but now we
	have an indexed name also for the top layer.

	When registering the top layer using the drive name, recent libvirt
	(reported in 6.6) submits block threshold event twice; once for the
	registered name (vda) and once for the indexed name (vda[7]).

	The unexpected event logs a warning like:

	    2021-04-10 08:12:10,550+0800 WARN  (libvirt/events) [virt.vm]
	    (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') Unknown drive 'sdi[7]' for
	    vm 7af5361c-012b-41f9-9ea9-33917df23f21 - ignored block threshold event
	    (drivemonitor:177)

	Submitting the event twice in libvirt is likely a bug in libvirt, but
	the best way to avoid this is to always use indexed name in vdsm.

	Using indexed name is also the only way we can monitor block threshold
	on scratch disks and block copy destination image.

	This change adds an optional index argument to set_threshold(), similar
	to the index argument in clear_threshold(). This allows callers to start
	using index also for drive top layer.

	Using an index, we will get only one event for the indexed name,
	eliminating the warning. However since we get indexed name, we must
	parse the name to drive name and index to locate the drive.

	A minor issue is that we cannot log the path when since we don't have a
	way to get the path related to the libvirt index. We may add this later
	to improve logging.

	Bug-Url: https://bugzilla.redhat.com/1948177
	Bug-Url: https://bugzilla.redhat.com/1913387

	virt: storage: Support top volume index
	Since libvirt 5.1.0[1] we have also an index for the top layer. Update
	disk xml parsing to extract the index from the source element.

	Since we have to get the index from source element for the top volume,
	and the backingStore element for the backing chain, I had to rewrite the
	parsing code.

	The new parsing code is more strict, enforcing the libvit domain xml
	format. For example, we used to allow missing <backingStore/> element,
	but libvirt domain xml does not specify this element as optional.

	With this change, we always get a target name for the top volume, so in
	blockCopy we always use indexed name instead of None.

	[1] https://bugzilla.redhat.com/1451398

	Bug-Url: https://bugzilla.redhat.com/1948177
	Bug-Url: https://bugzilla.redhat.com/1913387

	virt: storage: Remove unhelpful allocation field
	VolumeChainEntry.allocation is always None. This is leftover from very
	old code expecting allocation information in libvirt xml that was never
	available.

	livemerge: Remove unneeded variable
	Drive alias is useful for logging when looking up drive volume chain,
	but we don't need to keep a local variable. Using drive.alias is more
	clear and minimize state.

	livemerge: Don't log and raise
	Logging and raising for same error is common anti pattern that makes
	debugging harder. Replace detailed error log and exception with
	unhelpful error message with exceptions with detailed error message.

	vm: Improve logs when getting actual chain
	Unify error messages when getting drive actual volume chain failed.

	Fix error message formatting when raising, we raised a tuple (format,
	alias) instead of a formatted message.

	vm: Simplify drive_get_actual_volume_chain()
	We always want single drive volume chain, but the function accepts list
	of drives which makes it harder to use. Simplify it to do what we need.

	drivemonitor: Simplify check for events
	Unify the way we check for events in all methods.

	vm: Remove pointless getVolumeSize() call
	When refreshing a disk, we called getVolumeSize() twice, once before
	updating drive volume size, and once after the update. The second call
	is not needed since volume size cannot change by
	Vm._update_drive_volume_size(). Even if we are running on the SPM, and
	the volume size was extended again, we care only about the size after
	the refresh.

	tox: Disable flake8 warnings about binary operators
	In vdsm we always break long conditions after binary operators. Looks
	like old flake8 versions suggested this style. Newer flake8 changed the
	rules to complain about this style.

	Both styles are correct, and there is no reason to change good code
	style, so silence the unhelpful warning.

2021-07-19  Marcin Sobczyk  <msobczyk@redhat.com>

	tests: fileSD: Add tests for ip-based SDs
	With [1] we've fixed a problem of scanning for unregistered
	images on reattached ipv6-based storage domains. This was detected
	only in OST, so let's add a parameter to 'TestFileManifest' to run
	fake storage domains in paths that resemble FQDNs, ipv4 and ipv6
	addresses.

	[1] https://gerrit.ovirt.org/c/vdsm/+/115693

2021-07-16  Roman Bednar  <rbednar@redhat.com>

	lvm: let command execution raise
	Add run_command() which will now raise on errors and mask this in cmd()
	so the callers don't break.

	See a followup patch on how run_command() will replace old cmd() calls
	in all the callers. When done cmd() should be removed.

	Bug-Url: https://bugzilla.redhat.com/1536880

	lvm: make commands raise early for better debugging
	Raising LVMCommandError in LVMRunner.run() will simplify cmd() callers
	so they don't need to check for rc. This also makes code more robust
	since caller can't forget to check the return value.

	Also the error contains all details for more helpful logging.

	Bug-Url: https://bugzilla.redhat.com/1536880

	tests: lvm: use proper rc for warning suppression tests
	Using 0 return code for tests where we suppress warning messages is not
	correct. Change this to use non-zero rc to better reflect real case.

2021-07-15  Vojtech Juranek  <vjuranek@redhat.com>

	virt: don't raise when setting disk threshold and VM is not running
	If we refresh the disk during migration, setting the threshold may fail
	if the destination VM is not running yet. The same situation can happen
	when VM is shutting down. Skip setting disk threshold when underlying
	libvirt domain is not connected yet and set it to BLOCK_THRESHOLD.UNSET
	so it will be eventually set once VM is running by periodic job. This is
	the case once migration is finished - disk thresholds are set by
	periodic job right after it.

	Bug-Url: https://bugzilla.redhat.com/1981307

2021-07-14  Marcin Sobczyk  <msobczyk@redhat.com>

	fileSD: ipv6: Add missing 'glob_escape'
	Mount point paths for ipv6-based storage domains can contain
	'[]' brackets, which messes up glob patterns. To avoid this
	one needs to escape the special characters before globbing.

	We're missing the escaping in 'FileStorageDomainManifest.getAllImages'
	method, which breaks scanning for unregistered disks after reattaching
	a storage domain.

2021-07-14  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.80.1

2021-07-14  Ales Musil  <amusil@redhat.com>

	net: Switch logging to lazy formatting
	Logging formatting is lazy by default, switch to this
	formatting instead of using other methods that are evaluated
	even if the logging itself is not executed due to e.g. lower
	level than configured.

2021-07-13  Ales Musil  <amusil@redhat.com>

	net, tests: Unify nic naming in integration tests
	- Instead of having fixed bridge name "br1" in bridge_test
	use randomized name with prefix vdsm-. The hebrew bridge in
	ipwrapper_test already had randomized suffix, add vdsm- prefix
	so it is clearer that the bridge was created by tests.
	Update the regex for cleaning up leftover devices to minimize
	risk of deleting device that should be present on the node.

	- Remove test-network constant that was unused.

	net, tests: Remove xfail from link_bridge_test
	Now that we ensured that the node is without any leftover nics
	it should be possible to run those tests again.

	net, tests: Cleanup node from leftover interfaces
	Cleanup any interface that might left on the node from
	previous network tests run.

2021-07-12  Tomáš Golembiovský  <tgolembi@redhat.com>

	virt: report VDSM process affinity

2021-07-09  Vojtech Juranek  <vjuranek@redhat.com>

	virt: fix error handling in Vm.refresh_disk() method
	When the exception is thrown from Vm.findDriveByUUIDs(), drive variable
	used in catch block is not defined. Use domain and volume ID instead
	to specify the disk.

	Example exception:

	vdsm.common.exception.DriveRefreshError: Failed to refresh drive: {'vm_id': '2dad9038-3e3a-4b5e-8d20-b0da37d9ef79', 'domain_id': 'cdac2a0c-b110-456d-a988-7d588626c871', 'volume_id': 'c16033d3-9608-4103-a3d4-ab4490f76b1f', 'reason': 'TEST EXCEPTION'}

	Bug-Url: https://bugzilla.redhat.com/1883399

2021-07-09  Ales Musil  <amusil@redhat.com>

	net, tests: Fix wrong indentation of add_bridge_with_stp
	Due to the wrong indentation the test was not running for
	linux bridge.

2021-07-08  Nir Soffer  <nsoffer@redhat.com>

	backup: Remove last hacks for old libvirt
	Remove backup.cold_backup_supported and hacks to cope with libvirt
	missing the VIR_ERR_CHECKPOINT_INCONSISTENT constant.

	backup: Backup is always enabled
	Remove backup.enabled, @backup.requires_libvirt_support, and test mark
	@requires_backup_support. These were needed when we had to be compatible
	with older libvirt not supporting incremental backup.

	backup: Remove unneeded hack for old libvirt
	We require libvirt 7.0.0 for a while, so there is no need for hacks
	added for libvirt < 6.6.

2021-07-08  Vojtech Juranek  <vjuranek@redhat.com>

	virt: fix error message when disk refresh fails
	Example exception:

	2021-07-08 06:26:38,032-0400 ERROR (mailbox-hsm/2) [storage.TaskManager.Task] (Task='458c8aac-4746-4bfa-9779-379a716228fe') Unexpected error (task:877)
	Traceback (most recent call last):
	  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 884, in _run
	    return fn(*args, **kargs)
	  File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 1568, in after_volume_extension
	    self._refresh_destination_volume(volInfo)
	  File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 1536, in _refresh_destination_volume
	    dest_vol_size = self._refresh_migrating_volume(volInfo)
	  File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 1451, in _refresh_migrating_volume
	    return self._migrationSourceThread.refresh_destination_disk(vol_pdiv)
	  File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line 802, in refresh_destination_disk
	    raise exception.CannotRefreshDisk(reason=result["status"]["message"])
	vdsm.common.exception.CannotRefreshDisk: Failed to refresh disk on the destination: {'reason': "General Exception: ('TEST EXCEPTION',)"}

	Bug-Url: https://bugzilla.redhat.com/1883399

	virt: add _switch_state() method
	Add method for switching the state which logs the change of the state.

	virt: improve the check when disk should be refreshed
	Improve the check when the disk during migration process should be
	refreshed on the destination host. Currently condition is met also
	when the migration thread waits for migration semaphore and can result
	into unnecessary refresh failure as the VM is not defined on the
	destination host by that time. New condition is met only when VM is
	defined on the destination host.

	For this reason add new method into migration thread which returns True
	if the VM is defined on the destination host.

	Also add new VM method which determines if the disk refresh is needed or
	not. Besides nicer code flow, it reduce the need of adding migration
	thread into fake VM classes in the tests.

	For recovery flows set _state to STARTED, which further simplifies the
	condition and only _state needs to be checked.

	Bug-Url: https://bugzilla.redhat.com/1883399

	virt: add PREPARED migration state
	Add another migration state `PREPARED` which is set before VM is defined
	on the destination host and state is moved to `STARTED` once VM is
	defined on the destination.

	virt: replace SourceThread._failed with state
	After introducing SourceThread._state, we can replace
	SourceThread._failed flag with _state as it contains also FAILED
	state.

	virt: convert PostCopyPhase to Enum
	Convert PostCopyPhase from migration module to IntEnum. It would be more
	consistent with newly added State enum and will also means nicer log
	messages where e.g. instead of 1 will be PostCopyPhase.REQUESTED.

	virt: introduce migration state
	Introduce state variable of migration source thread. It will be used
	instead of current _started variable and will provide more detail
	information about the state of migration. Currently possible
	values are:

	- INITIALIZED: migration thread was create and initialized
	- STARTED: VM on the destination host was created and migration is
	  about to start
	- FAILED: migration filed

	Later on more states will be added to provide more precise information
	about migration state.

2021-07-08  Ales Musil  <amusil@redhat.com>

	net, tests: Remove xfail from edit_default_gateway for ovs
	This test is fine with ovs, the xfail applied only to the old
	backend but was never removed when we switched to nmstate for ovs.

	net, tests: Remove nmstate mark
	All functional tests are running with nmstate nevertheless,
	nmstate as backend is default for a long time. Remove the nmstate
	mark as there is no point in keeping that.

2021-07-05  Nir Soffer  <nsoffer@redhat.com>

	mailbox: Remove unneeded pylint exclude
	pylint fixed the issue with UUID.int in 2018[1], and pylint 2.4.0 does
	not complain about missing member.

	[1] https://github.com/PyCQA/pylint/issues/961

	misc: Move [un]packUuid to mailbox
	Packing and unpacking UUIDs is internal implementation detail of the
	mailbox module, so it is better to keep the functions there.

	While moving the function, modernize the names.

	tests: misc: Simplify test UUIDs
	Using string values simplifies the test and explain better how we pack
	and unpack UUIDs.

	misc: Use int builtin methods
	Using int.to_bytes() and int.from_bytes() instead of reinventing them.

	tests: network: Use strict=False for flaky tests
	The link bridge tests usually fail but sometimes they succeed. We must
	use xfail(strict=False) to handle such flaky tests.

	tests: blockdev: Skip discard if not supported
	We assumed that blkdiscard always works on a block device, but when
	running a container with podman using overlayfs, blkdiscard is not
	supported:

	    # truncate -s 1g /var/tmp/backing
	    # losetup -f /var/tmp/backing --show
	    /dev/loop4
	    # dd if=/dev/zero bs=1M count=1024 of=/dev/loop4 conv=fsync
	    # blkdiscard /dev/loop4
	    blkdiscard: /dev/loop4: BLKDISCARD ioctl failed: Operation not supported

	Now we skip the tests if blkdiscard fails. Here is an example run:

	    # tox -e storage tests/storage/blockdev_test.py
	    ...

	    storage/blockdev_test.py ........s                          [100%]

	    ==================== short test summary info =====================
	    SKIPPED [1] storage/blockdev_test.py:163: blkdiscard not supported:
	    b'blkdiscard: /dev/loop5: BLKDISCARD ioctl failed: Operation not
	    supported\n'

	tests: mpathconf: Remove unneeded skips
	We don't test on Fedora 30 for few years.

	tests: Mark tests with @requires_selinux
	Some tests segfault in jenkins in:

	storage/mpathconf_test.py ..Fatal Python error: Segmentation fault

	Current thread 0x00007f0052ba5b80 (most recent call first):
	  File "/usr/lib64/python3.6/site-packages/selinux/__init__.py", line 128 in restorecon
	  File "/home/jenkins/workspace/vdsm_standard-check-patch/vdsm/lib/vdsm/storage/fileUtils.py", line 289 in atomic_write

	This looks like selinux bug, but the same test pass in a real centos
	stream host, or in a container. Lets try to skip the tests if selinux is
	not enabled.

	tests: network: Mark tests as xfail on Jenkins
	Some link bridge tests are failing in Jenkins for a week. Mark the tests
	as xfail to unbreak the build until the issue is investigated.

	Both tests fail with this error:

	    vdsm.network.ipwrapper.IPRoute2AlreadyExistsError:
	    (2, ['RTNETLINK answers: File exists'])

	automation: Require NetworkManager
	Without NetworkManager network tests fail in:

	>   raise child_exception_type(errno_num, err_msg, err_filename)
	E   FileNotFoundError: [Errno 2] No such file or directory: 'nmcli': 'nmcli'

	In the container we require NetworkManager and the test pass.

	automation: Require openssl
	Without openssl, lib tests fail now when trying to generated
	certificates:

	>   raise cmdutils.Error(args, p.returncode, out, err)
	E   vdsm.common.cmdutils.Error: Command ['openssl', 'genrsa', '-des3',
	    '-passout', 'pass:secretpassphrase', '-out',
	    '/tmp/tmpu5d507m9.pass.key', '2048'] failed with rc=127 out=b''
	    err=b'taskset: failed to execute openssl: No such file or
	    directory\n'

	In the container openssl is available, probably because some other
	package requires it. Make the requirement explicit.

	automation: Remove unneeded dnf update
	Since we use centos stream advanced virt repo, we already have
	python3-libvirt and there is no need to update.

	Since python3-libvirt version is important for virt tests, lets log the
	version to make debugging issues easier in the future.

	automation: Remove advanced-virt hack for centos
	We are using centos stream advanced virt repo so the hack is not needed.

2021-06-30  Milan Zamazal  <mzamazal@redhat.com>

	New version: 4.40.80

	New release: 4.40.70.6

2021-06-30  Vojtech Juranek  <vjuranek@redhat.com>

	virt: disable disk extension during migration when host doesn't support it
	In previous patches support for disk extension during migration was
	implemented. It relies on the capability of the destination host to be
	able to refresh volume being extended. If the destination host doesn't
	support it, we have to abort extension, but it will be extended soon
	again as drive monitor checks the drives every 2 seconds. To avoid the
	noise in the logs, disable disk extension during migration when
	destination host doesn't support disk refresh. Once migration finishes,
	enable disk monitoring on the VM again.

	To avoid additional disk refresh RPC calls when the destination host
	doesn't support disk refresh, check the destination host capabilities
	first and remember the result for any other disk refresh call. It saves
	us more RPC calls which would fail anyway and also repeating exceptions
	in source host log. We cannot simply catch the exception from refresh
	disk call and ignore it, as in case refresh disk on destination host is
	supported, but fails for some other reason, we want to repeat it and
	there isn't any good way how to distinguish these cases.

	To make sure callback is always called in Vm.after_volume_extension(),
	move callback call into finally block and use sys.exc_info() to pass in
	exception into the callback if there's any.

	Bug-Url: https://bugzilla.redhat.com/1883399

2021-06-29  Vojtech Juranek  <vjuranek@redhat.com>

	virt: refactor after_volume_extension method
	In follow-up patch I'm going add more stuff into this method. Refactor
	relevant part of it into smaller method for better readability and
	maintainability.

2021-06-28  Nir Soffer  <nsoffer@redhat.com>

	tests: fileutils: Fix test on python 3.6
	The test wrongly checked for EPERM, while actual error is EACCES.  The
	test does not run it the CI since it requires unprivileged user, and CI
	runs as root. Developers usually run the tests with newer python that
	does not raise in this case.  Running the tests on a RHEL 8.5 host as
	unprivileged user revealed the issue.

2021-06-25  Nir Soffer  <nsoffer@redhat.com>

	qemuimg: Remove hacks for --target-is-zero
	We require qemu-img 5.2.0, so we don't need hacks for older qemu-img
	that did not support the --target-is-zero option.

	qemuimg: Remove hacks for bitmap support
	We require qemu 5.2.0 for a while so we don't need hacks for supporting
	qemu-img version without bitmaps support or with buggy support.

	tests: backup: Use more specific names
	expected_xml is not specific enough. We have 2 cases:
	- input xml, for testing the xml generated by start_backup()
	- output_xml, for simulating libvirt state

	Currently most tests use input_xml, and only one test use output_xml.
	Rename expected_xml to more specific name.

	To monitor scratch disk, start_backup() will have to parse libvirt
	backup xml, so we will need to add output_xml to all tests using
	start_backup().

	tests: backup: Simplify monkeypatching
	Almost all the tests use the tmp_backup and temp_basedir to monkeypatch
	the standard locations for backup socket and transient disks. No test
	use the monkeypatch values, so there is no need to have 2 fixtures.
	Merge both fixture into simpler tmp_dirs fixture.

	tests: backup: Rename private helpers
	Tests do not need to use private names.

	tests: backup: Simplify scratch disk creation
	The tmp_scratch_disks fixture is too rigid and used only by one tests.
	Replace it with simple function creating temporary qcow2 that can be
	used as a scratch disk.

	tests: backup: Make fake vm drives configurable
	Currently FakeVM always use same drives, defined as module constant.
	There are several issues with this:

	- We cannot test different kind or number of drives
	- All tests depend on global data, it is very hard to change the tests
	- It is very hard to understand the tests setup since the state is
	  defined far away from the actual test.

	This change make it possible to use FakeVM with different drives, but
	does not change any tests yet.

	tests: backup: Make all fields in FakeDrive configurable
	For monitoring scratch disks, we need drives with diskType="block". Make
	all the attributes settable via the constructor to make it easy to
	created different kinds of drives for different tests.

	tests: backup: Simplify FakeHSM
	There is no need to keep ready state when always report True, but we
	must override the original __init__ to avoid side effects. Add docstring
	explain this.

	drivemonitor: Clean up logs
	- Show also the path when setting a block threshold, so we can detect
	  what was the drive path when the threshold was set.

	- Show the excess value from libvirt block threshold event. It may help
	  to diagnose issues when the VM write data too fast.

	- Use "for drive 'name'" instead of "on 'name'" to match related logs.

	- Some messages were not capitalized

	Related-To: https://bugzilla.redhat.com/1948177

2021-06-24  Nir Soffer  <nsoffer@redhat.com>

	docs: Configuring based on outage duration
	Due to the way sanlock renew leases, it cannot tolerate an outage of 8 *
	io_timeout. The maximum outage duration is actually 5 * io_timeout, or
	it we are lucky, 7 * io_timeout.

	Explain how to compute sanlock:io_timeout based on the storage outage
	duration, and how to compute multipath no_path_retry based on
	sanlock:io_timeout.

	In the example configurations table, show "outage duration" instead of
	"effective timeout" which is confusing to users.

	Use "sanlock lease" instead of "storage lease". The previous term is
	correct but it was confusing.

	Add section showing sanlock renewal flows, to make the calculation
	clear.

	You can preview the rendered document here:
	https://github.com/nirs/vdsm/blob/io-timeouts/doc/io-timeouts.md

2021-06-23  Vojtech Juranek  <vjuranek@redhat.com>

	caps: add capability for refreshing disk
	In previous patch new API for refreshing Vm drives was added.
	To easily idnetify hosts which supports this API and avoid catching
	exceptions from json-rpc when the API doesn't exists on the host, add
	new capability `refresh_disk_supported`. When it's present, host
	supports VM drive refresh.

	Bug-Url: https://bugzilla.redhat.com/1883399

2021-06-23  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.70.5

2021-06-21  Vojtech Juranek  <vjuranek@redhat.com>

	doc: document chunked drive extenstion flow during migration
	- remove wrong description how extend during migration behaves, which
	  never seems to implemented in the code
	- describe how volume extend during migration is implemented, with
	  respect to previous patches

	Bug-Url: https://bugzilla.redhat.com/1883399

	virt: refresh disk on destination during migration
	VM's disks can change on underlying storage while migration is running.
	Such example is e.g. extending the disk. The disk is extended on the
	source host, but the change is not visible on the destination host.

	Refresh the disk being extended first on the destination host during
	migration, to be sure that the disk won't become corrupted if the
	migration process is finihed and VM on the destination is resumed by
	libvirt.

	Bug-Url: https://bugzilla.redhat.com/1883399

	virt: add new API for volume refresh
	Add new API method for refreshing VM drives. This is useful for
	migration workflows, when VM volume on source host can be extended and
	we need to extend the volume also on the destination host, otherwise
	volume can be corrupted once VM is started on the destination host.

	Bug-Url: https://bugzilla.redhat.com/1883399

	virt: move VolumeSize into utils module
	This named tuple will be needed also in migration module in follow-up
	patches. To avoid circular dependencies, move it into virt utils module.

	vm: raise when volume refresh fails
	Don't ignore the status returned from RPC call and raise when there was
	an error during volume refresh.

2021-06-21  Nir Soffer  <nsoffer@redhat.com>

	spec: Require ovirt-imageio >= 2.2.0-1
	Recently we enabled qemu:allocation-depth meta context in vdsm NBD
	server. Require imageio version using this capability to support single
	snapshot transfers.

	While updating imageio version, clean up the requirement to require the
	version only once, since imageio daemon requires a specific imageio
	common version.

	Bug-Url: https://bugzilla.redhat.com/1971182

	Revert "spec: Exclude broken libvirt version"
	This reverts commit fc75836c5468c81bdf540bcc6101f485df5e235d.

	The original patch was needed for these bugs:

	- https://bugzilla.redhat.com/1966842
	  Fixed upstream

	- https://bugzilla.redhat.com/1971182
	  Fixed in vdsm master an imageio 2.2.0

	- https://bugzilla.redhat.com/1970277
	  Relatively small bug that's not worth blocking recent libvirt and
	  qemu.

	The current limit on libvirt making it harder to test vdsm with current
	libvirt and qemu version in RHEL 8.5.

	Finally, the original patch does not work in OST.

	If we want to limit libvirt version on Centos Stream, it should be done
	in the stable branch, not in master.

2021-06-19  Roman Bednar  <rbednar@redhat.com>

	volumemetadata: improve parsing of storage values
	Currently the _lines_to_dict() function can fail when decoding a value
	from storage if the value is not valid which causes the remaining
	values get skipped.

	Add a try/except block where save errors and skip the invalid values.

	If there is a skipped value it is later treated as missing key and
	saved errors are logged later in volumemetadata.dump()

	Bug-Url: https://bugzilla.redhat.com/1870887

	blockSD: fileSD: allow dumping incomplete metadata
	Use new parse() method for dumping which allows displaying metadata
	even if some values are missing on storage.

	Generation defaults to constant (sc.DEFAULT_GENERATION) so some tests
	need to change as well. Old code was raising before it got to
	assigning this default.

	Also adding a dump() function to volumemetadata to avoid repeating code
	is blockSD and fileSD dump flow.

	Bug-Url: https://bugzilla.redhat.com/1870887

2021-06-18  Ales Musil  <amusil@redhat.com>

	net: Update python black formatter
	Update python black formatter to latest version 21.6b0.

2021-06-17  Nir Soffer  <nsoffer@redhat.com>

	image: Fix measure of block volume in copyCollpase
	Since qemu 5.1 we need to use "host_device" driver when accessing block
	devices. This was added vdsm 4.40.33 in:

	commit 1718b8784e841405574c44abe2357997e3235723
	CommitDate: Fri Oct 2 08:21:15 2020 +0000

	    storage: support measuring without backing file

	But we missed a second call to qemuimg.measure() in copyCollapsed().
	When creating a template from vm without a parent volume, the operation
	fails with:

	    vdsm.common.cmdutils.Error: Command ['/usr/bin/qemu-img', 'measure',
	    '--output', 'json', '-O', 'qcow2', 'json:{"file": {"driver": "file",
	    "filename": "/path..."}, "driver": "qcow2"}'] failed with rc=1 out=b''
	    err=b'qemu-img: Could not open \'json:{"file": {"driver": "file",
	    "filename": "/path..."}, "driver": "qcow2"}\': \'file\' driver
	    requires \'/path...\' to be a regular file\n'

	Copying volume with a parent is not effected, since in this flow we use
	another way to estimate the destination volume size.

	Fixed by adding "block" boolean key in Volume.getVolumeParams(), and
	using it to call qemuimg.measure() correctly when measuring a block
	volume.

	is_block() was moved up to VolumeManifest() to make pylint happy. The
	bad tests faking volume parameters update to report also the new "block"
	key.

	Bug-Url: https://bugzilla.redhat.com/1973345

2021-06-16  Roman Bednar  <rbednar@redhat.com>

	volumemetadata: add parser method for validating storage data
	Adding a parser function that can be either used to validate input
	data for instantiating VolumeMetadata class using from_lines() or
	it can be used to parse invalid or incomplete metadata without
	instantiation of the class.

	This method is flexible enough to be used for dumping invalid or
	incomplete metadata by SD dump flow.

	Bug-Url: https://bugzilla.redhat.com/1870887

2021-06-16  Marcin Sobczyk  <msobczyk@redhat.com>

	New release: 4.40.70.4

2021-06-16  Eyal Shenitzky  <eshenitz@redhat.com>

	caps.py: add support for bitmap removal
	Bug-Url: https://bugzilla.redhat.com/1952577

	clear_bitmaps.py: introduce clear_bitmaps job
	When a snapshot is restored, all the bitmaps should be removed
	from the volume chain.

	Removing the bitmaps one by one using the remove_bitmap job can
	flood the engine and VDSM with jobs, so a new job is needed for
	removing the bitmaps in an efficient way.

	The engine will use this new job to remove all the bitmaps
	from a volume, and will use it for the entire chain when
	a snapshot is restored.

	Bug-Url: https://bugzilla.redhat.com/1952577

	bitmaps.py: add clear all volume bitmaps operation
	Add new clear_bitmaps() operation to remove all the given
	volume bitmaps.

	Will be used to support the option to clean all the VM
	checkpoints when a snapshot is restored.

	Bug-Url: https://bugzilla.redhat.com/1952577

	remove_bitmap_test.py: add tests for bitmap removal
	Add tests for the remove volume bitmap job.
	Those tests were probably missing since the engine wasn't use
	this job until now.

	The engine will use remove volume bitmap job to remove a VM checkpoint
	when the VM is down.

	Bug-Url: https://bugzilla.redhat.com/1966177

2021-06-15  Nir Soffer  <nsoffer@redhat.com>

	tests: Unify lvm tests names
	The new pvmove tests are not consistent with other names, making it
	harder to run specific tests. Rename to "test_pv_move_cmd" and
	"test_pv_move". This allows running all pv tests using:

	    tox -e storage -- -k test_pv

	tests: Skip pvmove test also on travis
	This test fails in travis in the same way it fails on ovirt ci:

	    modprobe: FATAL: Module dm-mirror not found in directory /lib/modules/5.4.0-1044-gcp
	    /usr/sbin/modprobe failed: 1
	    Required device-mapper target(s) not detected in your kernel.

2021-06-15  Roman Bednar  <rbednar@redhat.com>

	exception: rename MetaDataKeyNotFoundError to InvalidMetadata
	MetaDataKeyNotFoundError is no longer needed as we want to allow
	dumping storage domain metadata even if some keys are missing.

	Renaming to InvalidMetadata will allow using this exception in a more
	generic way without having to add a new exception specifically for
	cases where some keys are missing.

	Bug-Url: https://bugzilla.redhat.com/187088

2021-06-14  Marcin Sobczyk  <msobczyk@redhat.com>

	spec: Exclude broken libvirt version
	'libvirt-7.4.0-1.el8s' from AV repository introduced a regression
	in IOThreads checks around 'virDomainAttachDevice' [1]. This patch
	excludes the broken version. With that, we'll effectively be using
	the older version published in AV repository, which is
	'libvirt-7.0.0-14.1.el8' until the problem is resolved.

	The downgrade of the libvirt version requires using older
	'qemu-kvm < 6.0' from AV as well, since 'libvirt-7.0' shows
	incompatibilities with the newer versions. On top of that,
	'qemu-kvm-6.0' itself also has some bugs reported [2] that would
	pose problems from vdsm's point of view.

	Once new, fixed version of qemu-kvm and libvirt are published to AV
	repo we should remove these restrictions.

	[1] https://bugzilla.redhat.com/1970277
	[2] https://bugzilla.redhat.com/1971182

2021-06-14  Roman Bednar  <rbednar@redhat.com>

	volumemetadata: rename attribute to increase consistency
	Parent attribute is not consistent, there is no reason to name it
	'puuid' since we present the value as 'parent' instead, e.g. in
	dump output.

	The only place 'puuid' is used is storage format key (sc.PUUID const.)

	Renaming it for consistency to allow cleaner code in followup patches.

	Bug-Url: https://bugzilla.redhat.com/1870887

2021-06-14  Nir Soffer  <nsoffer@redhat.com>

	nbd: Expose allocation depth in nbd server
	Run qemu-nbd with --allocation-depth option, exposing the
	"qemu:allocation-depth" meta context. This is required for downloading
	single snapshot using backing_chain=False.

	ovirt-imageio will use this info to report holes in qcow2 images
	correctly with qemu-nbd >= 6.0.0.

	Bug-Url: https://bugzilla.redhat.com/1971182

2021-06-13  Nir Soffer  <nsoffer@redhat.com>

	spec: Require qemu-img providing allocation depth
	Allocation depth allows detection of unallocated extents in qcow2
	images, required for transferring single snapshots.

	qemu-nbd added the --allocation-depth option in version 5.2.0. This
	version is available now in RHEL 8.4, Centos Stream, and Centos (via
	virt sig).

	Bug-Url: https://bugzilla.redhat.com/1971182

	tests: Minimize use of imageio internals
	To test dirty bitmaps, we used ovirt_imageio._internal.nbd.Extent.flags.
	This is internal detail which is not part of imageio public API, and it
	may change between releases.

	imageio 2.2.0-1 changes the NBD_STATE_HOLE bit (1) to EXTENT_DIRTY (2).
	Tests using this value breaks with newer imageio version.

	Change the test to access the extent properties (length, dirty) to make
	the test work with both old and new imageio.

	Bug-Url: https://bugzilla.redhat.com/1971182

2021-06-11  Lev Veyde  <lveyde@redhat.com>

	Automatically fix bad ownership/access mode of vdsm log files
	Currently the vdsm service will fail to start if it can't write to
	the vdsm log file. It can happen i.e. due to bad logrotate, that
	keeps the vdsm log file being owned by the root.

	This patch adds checks to verify that both file owners and access
	mode are set correctly for both vdsm and mom log files.

	Bug-Url: https://bugzilla.redhat.com/1970008

2021-06-10  Marcin Sobczyk  <msobczyk@redhat.com>

	build: Remove remaining usage of 'target_py' macro
	We don't need to differentiate py2 and py3 builds anymore.

	spec: Remove remaining py2-related vars

	spec: Remove py2 support for vdsm package

	spec: Remove py2 support for vdsm-common package

	spec: Remove py2 support for vdsm-network package

	spec: Remove py2 support for vdsm-python package

	spec: Remove py2 support for vdsm-hook-openstacknet package

	spec: Remove py2 support for vdsm-hook-checkips package

	spec: Remove py2 support for vdsm-hook-fileinject package

	spec: Remove py2 support for vdsm-http package

	spec: Remove py2 support for vdsm-client package

	spec: Remove py2 support for vdsm-jsonrpc package

	spec: Remove py2 support for vdsm-api package

	spec: Remove py2 support for yajsonrpc package

	spec: Remove py2 support for vdsm-gluster package

	spec: Remove py2 requirements
	This patch removes all '%if' conditionals around requirements that
	dealt with maintaining py2 compatibility.

2021-06-10  Roman Bednar  <rbednar@redhat.com>

	tests: add basic test for pvmove
	pvmove was lacking test coverage, adding simple test for
	pvmove with data verification before and after the move.

	This test works in container env but not in mocki env because dm-mirror
	kernel module is missing and causes pvmove to fail. Test is marked to
	skip on ovirt ci because of that.

	Bug-Url: https://bugzilla.redhat.com/1949059

2021-06-09  Nir Soffer  <nsoffer@redhat.com>

	tox: Increase storage minimal coverage
	On oVirt CI we have 69% coverage, and in Travis 68.6%. Increase the
	minimal coverage to 68%.

	docker: Require qemu-img 6.0.0
	Qemu 6.0.0 is available now in Centos Stream advanced virt repo[1]. We
	want to consume it now to detect regressions early.

	I'm not sure we can require this version the CI yet, since we run some
	tests on RHEL 8.4, providing only qemu 5.2.0.

	[1] http://mirror.centos.org/centos/8-stream/virt/x86_64/advancedvirt-common/

2021-06-09  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.70.3

2021-06-09  Ales Musil  <amusil@redhat.com>

	net: Force installation of python3-openvswitch
	nmstate-plugin-ovsdb recommends python3-openvswitch2.1*
	however the version is not restricted. That can lead
	to issues as it might install higher version of ovs.

	To prevent this we will require python3-openvswitch
	that is provided only by ovirt-python-openvswitch wrapper
	package, the package requires correct version of
	python3-openvswitch2.1*.

	Bug-Url: https://bugzilla.redhat.com/1966143

2021-06-08  Milan Zamazal  <mzamazal@redhat.com>

	virt: Switch to POWERING_DOWN status only from UP
	Logic at many places is dependent on Vm.lastStatus.  POWERING_DOWN VM
	status is mostly informative and used mainly in interaction with a
	guest agent.

	When asking a VM to shut down, it's dangerous to change its status to
	POWERING_DOWN status unconditionally.  For instance, consider the
	following scenario, observed in the attached bug:

	- A VM is migrating to another host.
	- The migration is about to finish.
	- VM.destroy call comes and sets POWERING_DOWN status.
	- Before the VM is destroyed, the migration finishes successfully.
	- The corresponding life cycle event handler sees the VM got down, but
	  is not in MIGRATION_SOURCE status and thus the corresponding
	  migration cleanup is not initiated.
	- The migration source thread is not finished and the outgoing
	  migration semaphore is not adjusted.
	- When it happens the second time, the outgoing migration semaphore is
	  exhausted and no further migrations are possible until Vdsm is
	  restarted.

	The VM life cycle logic is driven by switching the VM statuses and is
	not necessarily bound to the current state of the VM.  For example,
	when a migration finishes and the VM gets down on the source host,
	Vdsm can see in its status that the VM was migrating although the
	migration job is already done.  So using other checks at critical
	points would require tracking means similar to the VM status rather
	than just examining the current libvirt state of the VM.
	Additionally, there is a danger of causing confusion at Engine side.

	For these reasons, this patch allows switching the VM status to
	POWERING_DOWN only when the VM is UP, when it is completely safe to do
	so.

	In theory, it may not be without caveats.  For example, if a migration
	fails while the VM is powering down, we end up in UP state rather than
	POWERING_DOWN.  But this is (also) a corner case and at worst Engine
	will allow issuing VM power off again, which shouldn't cause any harm,
	unlike not executing actions bound to special VM statuses.  This could
	be improved, if needed, in further patches, where Vm._guestEvent could
	be considered.

	An additional protection could be to cancel a migration when VM
	shutdown or destroy is called.  But as for shutdown, Engine doesn't
	call shutdown during migration and calls power off instead.  And as
	for destroy, it could be dangerous to attempt additional actions
	there, especially when calling destroy is part of the migration flow
	itself.  Finally, a migration cannot be always canceled, most notably
	when the migration is in post copy phase.

	Bug-Url: https://bugzilla.redhat.com/1959436

2021-06-08  Nir Soffer  <nsoffer@redhat.com>

	docker: Require qemu-img-5.2.0
	The package is available in Centos stream[1] so there is no reason to
	test an older version. This version should enable all tests that could
	not work with the older version.

	[1] http://mirror.centos.org/centos/8-stream/virt/x86_64/advancedvirt-common/Packages/q/qemu-img-5.2.0-11.el8s.x86_64.rpm

	docker: Fix bad container name
	Since commit ace476a566ae928814aa0735ad10175692054a98

	    automation: Fix Travis container

	The dockefile cannot be built since the container name is missing server
	name.

	spec: Require libvirt 7.0.0-14
	This version fixes critical bug[1] in live storage migration, where after a
	"successful" operation libvirt forgets the target disk and all
	operations on the vm will fail.

	Since we have support only RHEL (like) distros now, and the package
	should exists in Centos stream, remove the complicated branches.

	The package is available in Centos stream advancedvirt-common repo:
	http://mirror.centos.org/centos/8-stream/virt/x86_64/advancedvirt-common/Packages/l/

	The advancedvirt-common repo was added to check-patch.repos. I'm not
	sure this is the right way since the build will fail if the remote repo
	is not accessible, but lets see if this actually works.

	[1] https://bugzilla.redhat.com/1955667

	tests: Fix tests checking sparse images
	Some tests assumed that qemu allocates at least one file system block,
	but this assumption is not correct for current qemu-img. On XFS it may
	set the file system extent size to 1 MiB, which increase the minimal
	allocation size.

	Other test wrote small amount of data which makes it hard to predict
	the total allocation. Change the test to write 1 MiB of data to make
	behave in the same way on different file systems.

	automation: Simplify pip requirements
	We installed virtualenv using pip, requiring virtualenv<20. This was
	probably required when we had to build both python 2 and 3. Currently
	installing virtualenv<20 fails with:

	    Could not find a version that satisfies the requirement
	    virtualenv<20 (from versions: )

	The error may be caused by network outage, but it reveals that we are
	using unneeded requirements.

	Python 3 has venv module, use it instead virtualenv.

	We also require coverage>=5. I don't remember why we added this, lets
	use the latest version.

	tox: Check code style only in specific directories
	We used . which includes .local, used by latest virtualevn. This cause
	flake8 to fail with bogus errors about code style under:

	    .local/share/virtualenv

	This should probably be fixed in flake8, but we really should be more
	precise when running flake8.

2021-06-08  Liran Rotenberg  <lrotenbe@redhat.com>

	snapshot: fix teardown handling
	When we teardown the snapshot operation, after the pivot and updating
	the drives parameter, we shouldn't fail the snapshot job. This case may
	happen when the host loses its connection to the storage, on the second
	attempt, some values such as volume ID might be missing. But the
	snapshot was created successfully and is usable.

	Bug-Url: https://bugzilla.redhat.com/1946193

2021-06-08  Roman Bednar  <rbednar@redhat.com>

	disable lvmpolld for pvmove to fix LUN reduce flow
	LUN reduce currently fails when LUN is not empty because pvmove
	has to be run in order to move the data to different PV. The pvmove
	uses specific filter on command line, rejecting all devices except the
	ones suitable for data move.

	This filter is not passed to lvmpolld and lvmpolld sees only filter
	defined in lvm.conf. So when filter in lvm.conf is set to reject more
	devices than the pvmove needs the operation fails because lvmpolld does
	not see those devices.

	This can be solved using devices file feature of lvm which should be
	available later in RHEL8.5 and RHEL9.0. At this point we can only
	disable lvmpolld for pvmove operation to provide a quick solution.

	Bug-Url: https://bugzilla.redhat.com/1949059

2021-06-07  Vojtech Juranek  <vjuranek@redhat.com>

	tests: add migration thread into live merge test VM
	Add migration thread into live merge RunningVM. This is needed by stuff
	anyhow touching migration. E.g. extending disk, which is used in tests
	in this module, uses VM.after_volume_extension() as a callback. In
	followup patch callback will include a check if migration is ongoing,
	which would cause of the tests to fail.

	Bug-Url: https://bugzilla.redhat.com/1883399

2021-06-07  Nir Soffer  <nsoffer@redhat.com>

	tests: Nicer names
	Replace cryptic names like "fm" with nicer names like "master".
	It is very clear that this is fake master since we create it using:

	    master = FakeMaster()

	Use "panic" instead of "fake_panic" so we can assert:

	    assert panic.was_called

	Keep cb as is since this a common name for callbacks.

2021-06-07  Ales Musil  <amusil@redhat.com>

	common, net: Check ovirt-openvswitch package instead of rhv-openvswitch
	rhv-openvswitch is being replaced d/s with ovirt-openvswitch
	to maintain single package. It should not affect oVirt at
	all because the rhv-openvswitch was never used there.

2021-06-03  Nir Soffer  <nsoffer@redhat.com>

	spec: Require sanlock 3.8.3-3
	This version fixes critical bug[1] when receiving bad client message,
	and exposes sanlock.inquire() API[2], used by vdsm to monitor the SPM
	lease.

	Sanlock 3.8.4-1 is available now in Centos stream[3], and sanlock
	3.8.3-3 build is available in RHEL for testing.

	[1] https://bugzilla.redhat.com/1965481
	[2] https://bugzilla.redhat.com/1965483
	[3] http://mirror.centos.org/centos/8-stream/BaseOS/x86_64/os/Packages/sanlock-lib-3.8.4-1.el8.x86_64.rpm

	Bug-Url: https://bugzilla.redhat.com/1961752

2021-06-02  Nir Soffer  <nsoffer@redhat.com>

	sp: Watch the SPM lease
	After acquiring the SPM lease, start watching the lease status, and
	panic if the status is not expected. Before releasing the SPM lease,
	stop the watchdog.

	The watchdog is used only when sanlock supports inquire(). This version
	is available now in Centos stream, and will be available soon in RHEL
	8.4.z.

	Starting and stopping the watchdog panic on errors:
	- If we fail to start the watchdog, we cannot be sure that the lease
	  remain value.
	- If we fail to stop the watchdog, the watchdog will panic when we
	  release the lease.

	Failure to start or stop the watchdog is very unlikely. If we cannot
	start or stop a thread, we probably have much bigger issue.

	The watchdog is enabled by default, but can be disabled if it causes
	trouble. The check interval is 20 seconds, matching sanlock check
	internal (io_timeout * 2).

	To change the configuration use this drop-in file:

	$ cat /etc/vdsm/vdsm.conf.d/99-local.conf
	[spm]
	watchdog_enable = true
	watchdog_interval = 20

	Bug-Url: https://bugzilla.redhat.com/1961752

	spwd: Introduce storage pool watchdog
	Add watchdog watching the master storage domain cluster lock after the
	SPM lease was acquired, and before it was released. The watchdog will
	panic on issues with the SPM lease, causing the host to lose the SPM
	role, and killing all child processes. This will trigger selection of
	new SPM.

	The watchdog panics in these cases:
	- The SPM lease is not reported by sanlock.
	- The SPM lease has unexpected disk (path, offset)
	- Inquiring sanlock failed with temporary error more than 3 times
	- Inquiring sanlock failed with unexpected error

	We should also verify the lease version, but turns out that engine does
	not pass the SPM lease version, so we don't have it when starting the
	SPM.

	The watchdog should be used only when the master domain cluster lock
	reports the new "supports_inquire" property.

	New logs introduced by this change:

	Starting the watchdog:

	    17:24:35,677 INFO    (MainThread) [storage.spwd] Start watching cluster
	    lock Lease(lockspace='master-domain-uuid', resource='SDM',
	    disk=('/master/leases', 1048576), version=1) (spwd:79)

	Stopping the watchdog:

	    17:24:36,179 INFO    (MainThread) [storage.spwd] Stop watching cluster
	    lock Lease(lockspace='master-domain-uuid', resource='SDM',
	    disk=('/master/leases', 1048576), version=1) (spwd:88)

	In debug level, we log each successful check:

	    17:24:35,879 DEBUG   (spwd) [storage.spwd] Found cluster lock
	    {'lockspace': 'master-domain-uuid', 'resource': 'SDM', 'version': 1,
	    'disks': [('/master/leases', 1048576)]} (spwd:149)

	If the watchgod fails with a temporary error, we log a warning:

	    17:35:54,378 WARNING (spwd) [storage.spwd] Error (2/3) checking cluster
	    lock Lease(lockspace='master-domain-uuid', resource='SDM',
	    disk=('/master/leases', 1048576), version=1) (spwd:119)

	If the watchdog panics, we will see a panic exception message like:

	    Panic: Invalid cluster lock disk expected=Lease(
	    lockspace='master-domain-uuid', resource='SDM',
	    disk=('/master/leases', 1048576))
	    actual={'lockspace': 'master-domain-uuid', 'resource': 'SDM',
	    'version': 2, 'disks': [('/master/leases', 10485760)]}

	Bug-Url: https://bugzilla.redhat.com/1961752

2021-06-02  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.70.2

2021-06-02  Roman Bednar  <rbednar@redhat.com>

	lvm: avoid exception for in use volume deactivation
	If two VMs use the same ISO in block storage domain and one of the VMs
	is terminated the teardownImage flow ends up raising an exception when
	trying to deactivate logical volume for that image.

	It is safe to ignore the deactivation attempt and instead of raising
	this exception we can treat the volume as busy and just log that we
	are ignoring a failed change of the volume.

	Unfortunately lvm currently does not provide very reasonable exit codes
	and so the only way to tell that a volume is busy is inspecting error
	message of lvchange command.

	Correct solution to this problem would be to implement reference
	counting mechanism so we can keep track of volume usage. This
	however requires more work and is planned to be added in future
	versions.

	Bug-Url: https://bugzilla.redhat.com/1725915
	Related-to: https://bugzilla.redhat.com/1536880

2021-06-01  Nir Soffer  <nsoffer@redhat.com>

	clusterlock: Add inquire()
	The new inquire() method is supported only by SANLock cluster lock (when
	sanlock module has this function). To detect inquire support, all locks
	have now "supports_inquire" class attribute. Caller should check the
	capability before trying to invoke inquire.

	The new SanlockInquireError provides is_temporary() predicate to allow
	callers to handle temporary errors correctly.

	Bug-Url: https://bugzilla.redhat.com/1961752

	fakesanlock: Simulate EBUSY
	Sanlock may fail resource operations with EBUSY if a client command is
	busy performing operation in the thread pool. This should not happen in
	vdsm since we serialize calls to acquire(), release(), and inquire(),
	but I'm not 100% sure this is not possible.

	Add "busy" flag to resource, so we can simulate a failure using EBUSY
	error.

	Bug-Url: https://bugzilla.redhat.com/1961752

	fakesanlock: Fix simulated sanlock errors
	Sanlock errors have:
	- errno
	- error message
	- error name

	The error message is usually the same error, used when calling sanlock
	client API returned non-zero error code. Add "error" variable to most
	fake API to simulate this behavior.

	The errno name depends on the errno. sanlock client library always
	return negative error codes, and they are normalized by the python
	binding to errno values for 0 > code > -200. Add _error() helper for
	creating SanlockException() with the right error name based on errno
	value.

	Fix all the APIs to use same errors as we get from real sanlock.

	Bug-Url: https://bugzilla.redhat.com/1961752

2021-05-27  Milan Zamazal  <mzamazal@redhat.com>

	api: Add initial type annotations
	They allow jumping to the source code of virt.Vm methods called from
	API.py when using Python language server protocol.

2021-05-27  Ales Musil  <amusil@redhat.com>

	net: Add initial support for nmstate source routing
	Initial support includes logic for creating and removing
	static source routes through nmstate with tests for this code.

	However the change is not yet activated this is done
	via next patch in this series.

	Bug-Url: https://bugzilla.redhat.com/1962563

	net: Generalize Route class in preparation for source routes
	Prepare Route class for addition of source routing and rules
	by generalizing some methods that might be useful for source
	routes.

	Bug-Url: https://bugzilla.redhat.com/1962563

	net: Introduce new CurrentState class
	The new CurrentState is responsible to hold all info
	that is available from state_show and might be important
	for generating new state or displaying current netinfo.

	Bug-Url: https://bugzilla.redhat.com/1962563

	net: Hide some details of state generating behind NetworkingState
	With growing number of produced state parts it is harder to keep
	track of them. To mitigate that LinuxBridge and OvS will now
	produce instance of NetworkingState which hides details of
	setting MTUs on semi-complete state and merging it into single
	state consumed by nmstate.

	Bug-Url: https://bugzilla.redhat.com/1962563

2021-05-27  Vojtech Juranek  <vjuranek@redhat.com>

	doc: update package file name in README

2021-05-26  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.70.1

2021-05-25  Nir Soffer  <nsoffer@redhat.com>

	fakesanlock: Add inquire()
	Add inquire() method, added recently to sanlock python binding. This
	should be available in RHEL 8.4.z.

	Bug-Url: https://bugzilla.redhat.com/1961752

	clusterlock: Rename inquire() to inspect()
	We use the name inquire() for querying a lease that may be held by
	another host. This is a heavy operation using 3 sanlock APIs accessing
	storage. Sanlock has an inquire() API, which is a lightweight operation
	accessing sanlock daemon state on the current host, and returning list
	of leases owned by the caller program of by another program.

	Rename our special version to inspect(), to make room for
	sanlock.inquire().

	Bug-Url: https://bugzilla.redhat.com/1961752

2021-05-24  Dan Kenigsberg  <danken@redhat.com>

	tool_test: use a slightly nicer single write()

2021-05-24  Nir Soffer  <nsoffer@redhat.com>

	task: Use with statement
	Replace acquire and release using try-finally with `with` statement.
	This is cleaner, less fragile, and makes the code easier to grep when
	looking for acquire() and release() calls.

	resourceManager: Use with statement
	Replace acquire and release using try-finally with `with` statement.
	This is cleaner, less fragile, and makes the code easier to grep when
	looking for acquire() and release() calls.

	threadPool: Use with statement
	Replace acquire and release using try-finally with `with` statement.
	This is cleaner, less fragile, and makes the code easier to grep when
	looking for acquire() and release() calls.

	tool: sanlock: Validate max_worker_threads
	Previously if sanlock was running with suboptimal value of worker
	threads we could not detect this. In sanlock 3.8.3 the value is reported
	and we can detect this and require a restart.

	panic: Use logging.exception()
	logging.error(msg, exc_info=True) is same as logging.exception(msg).
	This makes it more clear that this function should be called in an
	exception handler.

	panic: Limit time spent shutting down logging
	We call panic when we must terminate vdsm and its child processes.
	Shutting down logging is useful to get more info about the panic, but it
	should not delay termination for unlimited time. A possible reason may
	be queuing I/O when logging to non responsive NFS mount or booting from
	multipath device.

	Use signal.alarm() to interrupt logging.shutdown() after 10 seconds and
	kill the process group.

	Using signal.alarm() requires registering signal handler, which must be
	done from main thread, so we register SIGALRM handler during startup.
	The exception raised by the signal handler is raised in the thread
	calling panic.panic() if logging.shutdown() times out.

	panic: Shutdown logging before killing process group
	When handling unrecoverable error we typically log some error or
	exception, and then the panic message. For example in this case we panic
	while releasing the SPM lease:

	    2021-05-20 17:10:24,173+0300 INFO  (jsonrpc/7) [storage.SANLock] Releasing Lease(name='SDM',
	    path='/dev/8ef0cc64-77c5-49e0-a0f5-03cee9bbee20/leases', offset=1048576) (clusterlock:602)

	    2021-05-20 17:10:29,957+0300 INFO  (MainThread) [vds] (PID: 853041) I am the actual vdsm
	    4.40.70.10.git91384f25a host4 (4.18.0-3
	    05.el8.x86_64) (vdsmd:158)

	But the logger thread was killed before writing the error and panic
	message to the log.

	Shutdown logging before killing the process group. This close all
	handlers, ensuring that pending messages are logged.

	sp: Simplify stopSpm and improve panic log
	We tried to perform all operations during stopSpm, and panic at the end
	if any of the operation failed:

	1. Unmount master mount
	2. Stopping SPM mail monitor
	3. Releasing the SPM lease

	This can be simplified by calling panic() on the first failure.  When we
	panic, the process terminates, so all mailbox threads will be stooped.
	Sanlock will detect vdsm termination and release the SPM lease.

	A nice side effect is that panic() is called in an exception handler in
	stopSpm, so it logs a proper traceback:

	    2021-05-22 01:15:07,314+0300 ERROR (jsonrpc/2) [root] Panic: Error releasing cluster lock (panic:31)
	    Traceback (most recent call last):
	      File "/usr/lib/python3.6/site-packages/vdsm/storage/clusterlock.py", line 610, in release
	        slkfd=SANLock._process_fd)
	    sanlock.SanlockException: (1, 'Sanlock resource not released', 'Operation not permitted')

	    During handling of the above exception, another exception occurred:

	    Traceback (most recent call last):
	      File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 465, in stopSpm
	        self.masterDomain.releaseClusterLock()
	      File "/usr/lib/python3.6/site-packages/vdsm/storage/sd.py", line 1252, in releaseClusterLock
	        self._manifest.releaseDomainLock()
	      File "/usr/lib/python3.6/site-packages/vdsm/storage/sd.py", line 580, in releaseDomainLock
	        self._domainLock.release(self.getDomainLease())
	      File "/usr/lib/python3.6/site-packages/vdsm/storage/clusterlock.py", line 618, in release
	        raise se.ReleaseLockFailure(self._sdUUID, e)
	    vdsm.storage.exception.ReleaseLockFailure: Cannot release lock: ('8ef0cc64-77c5-49e0-a0f5-03cee9bbee20',
	    SanlockException(1, 'Sanlock resource not released', 'Operation not permitted'))

	Before this patch, the panic message was:

	    2021-05-22 00:18:04,569+0300 ERROR (jsonrpc/7) [root] Panic:
	    Unrecoverable errors during SPM stop process. (panic:31)
	    NoneType: None

	"NoneType: None" is the result of logging an error with exc_info=True
	when we are not in an exception handler.

2021-05-24  Ales Musil  <amusil@redhat.com>

	automation: Fix Travis container
	The travis container should use latest nmstate
	available in stream with addition of ovs db plugin
	package.

2021-05-24  Nir Soffer  <nsoffer@redhat.com>

	sp: Wait until SPM mail monitor is stopped
	When stopping the SPM, we did not wait until the mail monitor was
	stopped before releasing the cluster lock. Stopping the mailbox sets a
	flag and return. Some mailbox worker threads may be busy handling
	extension request. If a new SPM is started, we can have 2 hosts trying
	to serve extension requests, which is likely to corrupt lvm metadata.

	Now we wait up to 60 seconds for mail monitor termination. If the
	operation times out we treat it as fatal failure and panic.

	We expect to see this log in stopSpm flow:

	    2021-05-20 17:10:24,173+0300 INFO  (mailbox-spm) [storage.MailBox.SpmMailMonitor]
	    SPM_MailMonitor - Incoming mail monitoring thr ead stopped (mailbox:765)

	If stopping the mail monitor times out, we expect to see:

	    2021-05-20 19:28:54,562+0300 ERROR (jsonrpc/4) [storage.StoragePool] Timeout stopping SPM mail
	    monitor (sp:456)

	And vdsm will panic:

	    2021-05-20 19:28:54,568+0300 ERROR (jsonrpc/4) [root] Panic: Unrecoverable errors during SPM
	    stop process. (panic:31)

	sp: Failure to stop mail monitor is critical
	Previously we did not treat failure to stop the mail monitor as fatal
	error. In this case we could have running mail monitor on a non-spm
	host. This is likely to corrupt lvm metadata.

	Treat this error as critical error. If mail monitor could not be
	stopped, vdsm will panic.

	spbackends: fix setSpmStatus signature
	Now that we log exceptions in stopSpm, we can see that the call always
	fails when using the memory backend:

	2021-05-20 17:10:24,173+0300 ERROR (jsonrpc/7) [storage.StoragePool] Error updating SPM status (sp:468)
	Traceback (most recent call last):
	  File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 465, in stopSpm
	    __securityOverride=True)
	  File "/usr/lib/python3.6/site-packages/vdsm/storage/securable.py", line 79, in wrapper
	    return method(self, *args, **kwargs)
	TypeError: setSpmStatus() missing 1 required positional argument: 'lVer'

	This failure is fine since the memory backend drop the provided values.
	Fix the method signature to avoid this failure.

	sp: Don't hide error in stopSpm
	When unrecoverable error occur during stopSpm we mark the operation as
	failed, and continue with the next call. Finally we panic with a generic
	message:

	    2021-05-20 16:08:38,961+0300 ERROR (jsonrpc/2) [root] Panic:
	    Unrecoverable errors during SPM stop process. (panic:29)

	But since we did not log the error, we don't have a clue what was the
	unrecoverable error.

	Now we log an exception for every error during stopSpm.

2021-05-21  Yalei Li  <274268859@qq.com>

	virt: Make the return of extend_drive_if_needed more obvious
	The function extend_drive_if_needed is used to check if an extend operation
	was started, So it is more appropriate to return 'True/False' than 'None'.

2021-05-20  Michal Skrivanek  <michal.skrivanek@redhat.com>

	spec: drop hard dependency on vmfex
	with deprecation of vmfex functionality the reasons in bug
	https://bugzilla.redhat.com/1286997 are no longer relevant. We also
	introduced ovirt-host to define a "mandatory" set of packages rather than
	directly requiring them in vdsm.

	Bug-Url: https://bugzilla.redhat.com/1947450
	Bug-Url: https://bugzilla.redhat.com/1899875

2021-05-20  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.70
	This is not for a build, just moving the version ahead of ovirt-4.4.6
	branch.

2021-05-19  Nir Soffer  <nsoffer@redhat.com>

	spec: Require sanlock 3.8.3
	We need this version to consume latest features in sanlock. This version
	should be in centos stream at this point, and if not the CI will tell
	us about it.

2021-05-19  Ales Musil  <amusil@redhat.com>

	net: Use nmstate to specify bridge mappings
	Instead of direct ovs-vsctl usage, use nmstate ovsdb
	plugin.

	net, automation: Move dependencies from common.sh to packages file
	openvswitch package is no longer needed as neither unit or
	integration tests interact with ovs.

	NetworkManager is actually dependency of nmstate and not
	directly ours.

	net, spec: Require nmstate ovs plugin
	The ovsdb plugin allows nmstate to operate on ovs database
	which is required for managing bridge mappings through nmstate.

2021-05-17  Sandro Bonazzola  <sbonazzo@redhat.com>

	README: make it more community friendly

2021-05-17  Ales Musil  <amusil@redhat.com>

	automation: Remove the rhel8 and el8stream suffixes
	Use single file for packages, environment and repos
	per sub stage. Both el8stream and rhel8 were linked to the
	same file.

2021-05-14  Ales Musil  <amusil@redhat.com>

	net: Remove dracut conf file
	The dracut conf file was omitting the ifcfg and
	clevis modules because it was causing issues with
	bridged networking. As consequence users that wanted
	to run auto-decrypt via clevis were failing to do so.

	The original issue is resolved by the ability
	of NetworkManager to run in dracut that is enabled
	by default since EL8.3 [0].

	[0] https://bugzilla.redhat.com/1627820

	Bug-Url: https://bugzilla.redhat.com/1955571
	Bug-Url: https://bugzilla.redhat.com/1959945

2021-05-13  Nir Soffer  <nsoffer@redhat.com>

	tests: Remove  unneeded requires_bitmaps_merge_support
	This was needed on Fedora for some time when upstream version was
	missing a fix for merging bitmaps, but it not needed for a while.
	Replace with requires_bitmap_support.

	The mark was disabling lot of tests on Fedora for no reason.

	qemuimg: Fix bitmaps_supported() with qemu 6.0.0-rc*.
	qemu release candidates are named:

	    qemu-img version 5.2.92 (qemu-6.0.0-0.2.rc2.fc32)

	The regular expression did not match since we expected only single
	digit for the third number. This disables lot of tests and bitmap
	functionality in runtime. Fortunately we don't support Fedora now, so
	this effects only the tests.

2021-05-13  Vojtech Juranek  <vjuranek@redhat.com>

	virt: prevent changing CD of floppy during migration
	Prevent changing CD or floppy disk during migration. It can result into
	inconsistent or broken state, e.g. volume not being prepared on
	destination.

2021-05-12  Roman Bednar  <rbednar@redhat.com>

	xlease: add comment about trailing empty records in index
	Adding a comment which explains writing empty records in the
	index during index rebuild as suggested here:
	https://gerrit.ovirt.org/c/vdsm/+/114019/9/lib/vdsm/storage/xlease.py#810

	Bug-Url: https://bugzilla.redhat.com/1902127

2021-05-12  Nir Soffer  <nsoffer@redhat.com>

	clusterlock: Panic if we lost leases silently
	In sanlock.acquire() or sanlock.release(), sanlock client is
	communicating with sanlock daemon using the process fd (slkfd=) instead
	of using new connection (pid=). If sanlock daemon closed the socket, the
	call will fail with SanlockExcpetion(EPIPE).

	If we are holding some leases when this error happens, this is a
	catastrophic event meaning that sanlock released our leases behind our
	back. This could lead to split brain and data corruption, similar to
	bug[1].

	In SANLock.aquire() we recovered from this error *silently* by
	registering a new socket with sanlock daemon and retrying the call. This
	hides the catastrophic event, making it impossible to debug. In
	SANLock.release() we were raising the error, which is little better,
	leaving some evidence in the log.

	When sanlock releases our leases behind our back, there is no good way
	to recover, so the best thing we can do is panic. Vdsm will lose the SPM
	role, and child processes started by vdsm will be killed.

	Add SANLock._lease_count counter, increased when acquiring a lease, and
	decreased when releasing a lease. When we detect EPIPE on the sanlock
	process fd, and the lease count is non-zero, we panic with this message:

	    Sanlock process fd was closed while holding 3 leases: ...

	[1] https://bugzilla.redhat.com/1952345

	fakesanlock: Simulate closing of sanlock process fd
	Make it possible to simulate the catastrophic event when sanlock process
	fd is closed by sanlock daemon.

	Also verify that sanlock.acquire() and release() are always called with
	the same fd returned by sanlock.register().

	clusterlock: Log changes in sanlock process fd
	Log when we register process fd with sanlock and when process fd is
	closed on sanlock side.

	clusterlock: Rename _sanlock_{fd,lock}
	Rename SANLock._sanlock_fd to SANLock._process_fd to make the purpose of
	this fd more clear.

	Rename SANLock._sanlock_lock to SANLock._process_lock to make it
	clear that this lock is protecting the process state.

	clusterlock: Cleanup formatting
	Reformat acquire() and release() in a more consistent way.

	- Remove unneeded resource_name temporary variable
	- Move extra_args out of the try block
	- Add blank line after raise
	- Use one argument per line in calls to sanlock.acquire()/release()

	clusterlock: Serialize access to sanlock fd
	Vdsm keeps per process sanlock socket (SANLock._sanlock_fd) used by
	sanlock to detect process termination. This socket is also used by
	acquire() and release() instead of creating a new connection to sanlock
	daemon. Since we use the same socket from multiple threads, both
	acquire() and release() must take the global SANLock._sanlock_lock when
	using the SANLock._sanlock_fd.

	Unfortunately release() was not taking SANLock._sanlock_lock since it
	was added in Jan 2012 in:

	commit 8665716cbf91401fadf59bae36a6e330e1b3156c

	    Use SANLock for the SPM resource

	Without taking SANLock._sanlock_lock, 2 threads can try to release()
	unrelated leases at the same time, which may corrupt the messages
	sent over the shared socket.

	When sanlock.release() is called, sanlock client performs:

	  send header
	  send body

	If more than one thread try to call sanlock.release() at the same time,
	the sends may be interleaved like this:

	  thread-1: send header 1
	  thread-2: send header 2
	  thread-1: send body 1

	When sanlock daemon tries to read body 1, it finds header 2 which has
	invalid data. Sanlock becomes confused, closes the socket and releases
	the leases owned by vdsm. This is incorrect behavior tracked by sanlock
	bug[1].

	This can lead to split-brain, when 2 processes that think they own a
	lease. In the linked bug, we have 2 hosts running as SPM, when only one
	was holding the SPM lease.

	Add the missing lock to release(), and a comment about locking rules.

	[1] https://bugzilla.redhat.com/1955813

	Bug-Url: https://bugzilla.redhat.com/1952345

2021-05-11  Nir Soffer  <nsoffer@redhat.com>

	sp: Rename UNSECURE to INSECURE
	And update the methods managing secure mode to modern style. This is not
	consistent with other code in sp.py, but this inconsistency has a
	purpose - it helps to tell which code is trustable and which is not.

	Here are example logs when host become the SPM:

	2021-04-27 23:40:02,707+0300 INFO  (tasks/0) [storage.StoragePool]
	Switching storage pool fb7797bd-3cca-4baa-bcb3-35e8851015f7 to SECURE
	mode (sp:130)

	And when host stops being the SPM:

	2021-04-27 23:40:14,195+0300 INFO  (jsonrpc/3) [storage.StoragePool]
	Switching storage pool fb7797bd-3cca-4baa-bcb3-35e8851015f7 to INSECURE
	mode (sp:137)

	Related-to: https://bugzilla.redhat.com/1952345

	sp: Log changes in secure mode
	Storage pool using 2 operation modes:
	- secure: The SPM lease is acquired, and metadata operations are
	  allowed.
	- insecure: The SPM lease is not acquired and metadata operation are not
	  allowed.

	Add INFO log when the storage pool switch between SECURE and INSECURE
	modes.

	Related-to: https://bugzilla.redhat.com/1952345

	sp: Improve logging when unmounting master
	Use INFO level when logging about the unmount. All state changes must be
	logged in INFO level.

	Capitalize the panic message and logs to be consistent with other logs.

	Use str.format() when formatting the panic message instead of fragile %
	formatting.

	Related-to: https://bugzilla.redhat.com/1952345

	blockSD: More idomatic error handling
	When handling error during doUnmountMaster(), handle unexpected errors
	before handling the expected condition.

	Also separate the next block after return or raise with a blank line to
	keep the flow more clear.

	Related-to: https://bugzilla.redhat.com/1952345

	sp: Improve canceling upgrade during stopSpm
	When stopping the SPM, we try to cancel upgrade domain. If the upgrade
	already started, we wait until the upgrade is completed. Rename the
	function to reflect the actual intent, add a docstring, and improve the
	log message. The new log is using INFO level so we have more visibility
	in this critical flow.

	Related-to: https://bugzilla.redhat.com/1952345

2021-05-10  Ales Musil  <amusil@redhat.com>

	net, automation: Replace deprecated Bond.SLAVES with Bond.PORT
	With new nmstate the Bond.SLAVES was deprecated in favor
	of Bond.PORT.

	At the same time use CentOS Stream for linters
	to prevent issue with the Bond.SLAVES/PORT.

2021-05-07  Ales Musil  <amusil@redhat.com>

	net, spec: Set required nmstate version to >= 1.0

	automation: Use CentOS Stream in automation
	oVirt will require RHEL 8.4 equivalent from version 4.4.7
	in order to test it we should use CentOS Stream.

	This patch does not use el8stream for linters, because there
	is an issue with nmstate.

	net: Cleanup ip/address module
	Change validation step to use IPAddressData instead of
	IPv4 and IPv6. This allows removal of unused functions
	from the address module and move the remaining functionality
	to single file.

	net, tests: Align net_with_bond tests to use nettestlib helpers

	net: Replace six.reraise with py3 raise

2021-05-06  Marcin Sobczyk  <msobczyk@redhat.com>

	ssl_test: Remove Fedora-related skips
	We don't support Fedora anymore, so we don't need workarounds
	for permissive crypto policies anymore - all supported distros
	disallow tls1 and tls1.1.

2021-05-06  Milan Zamazal  <mzamazal@redhat.com>

	virt: Refuse to run a snapshot with an invalid timeout
	Engine can run a synchronous instead of an asynchronous snapshot by
	mistake.  This is fixed in
	https://gerrit.ovirt.org/c/ovirt-engine/+/114591, but Engines without
	the fix can still attempt to run synchronous snapshots, with zero
	timeout.  Running snapshots with zero timeout makes no sense, so let's
	prevent such attempts by failing them immediately.

	We introduce a new exception / error code for the purpose because
	simply using SnapshotFailed would trigger a wrong flow on the Engine
	side.

2021-05-06  Ales Musil  <amusil@redhat.com>

	net: Use CentOS stream base image for containers

2021-05-04  Tomáš Golembiovský  <tgolembi@redhat.com>

	qga: do not poll libvirt for all guest info
	If our logic decides that there is nothing to query qemu-ga for
	(types == 0), which can happen several times every minute, we have to
	bail out and not pass this to libvirt API. Calling libvirt API with
	types = 0 will query all available information. This is not something we
	want as it burdens the guest unnecessarily.

	Bug-Url: https://bugzilla.redhat.com/1944495

2021-04-29  Ales Musil  <amusil@redhat.com>

	spec, automation: Remove python3-netaddr requirement

	net: Replace netaddr with standard lib ipaddress

	ssl: Replace netaddr with standard lib ipaddress

	net, tests: Fix wrong part of the message in pytest.skip

	net, tests: Remove usage of six
	Convert last bits of six usage to its Python 3
	equivalent.

2021-04-28  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.60.6

2021-04-26  Nir Soffer  <nsoffer@redhat.com>

	spec: Update sanlock version for RHEL 8.4
	Sanlock 3.8.3 includes several enhancements[1][2] that we like to use.
	Require this version on RHEL.

	While updating sanlock requirement, drop the useless python 2
	requirements.

	[1] https://pagure.io/sanlock/c/32a73a539432dbbca7187ddafd4a2b2ae91ea1f8
	[2] https://pagure.io/sanlock/c/0ff9c1ab8852bec846822ee2af55ebcb7e5f5967

2021-04-26  Vojtech Juranek  <vjuranek@redhat.com>

	virt: recover CD during VM recovery
	Recover CD metadata during VM recover and eventually deactivate LV of
	unused CD image.

	Bug-Url: https://bugzilla.redhat.com/1589763

	virt: add helper for cd recovery
	Add helper method for CD recovery. This helper method will be used
	during VM recovery to fix CD metadata and eventually deactivate unused
	LVs, e.g. in case when failure happens after activation of new CD but
	before switching CD in the VM.

	Bug-Url: https://bugzilla.redhat.com/1589763

	virt: use hwclass.DISK instead of cdrom in CD metadata
	In the past, when VM started with inserted CD and CD was ejected while
	VM was running, CD image wasn't deactivated if the CD was on the block
	SD. This was expected as CD on block SD was almost completely broken
	(the only case which somehow worked was starting VM with CD and also
	shut it down with this CD). However, deactivation of the image which was
	activated during VM start up doesn't work properly even after
	introducing new mechanism for CD change. This is caused by storing
	little bit different VM metadata by the code which handles VM start up
	and code for CD change. Code for starting VM stores CD in metadata as

	    <ovirt-vm:device devtype="disk" name="sdc">

	while CD change code as

	    <ovirt-vm:device devtype="cdrom" name="sdc">

	As a result, code handling CD eject doesn't find out there's already
	a CD and doesn't deactivate the image.

	There are two possibilities how to fix it:
	- introduce new hwclass "cdrom" and use it during VM start up or just
	  hardcode "cdrom" into metadata when creating it during VM start up
	- use existing hwclass.DISK also for CD change flows

	Second option, using hwclass.DISK for CD change metadata, is more
	consistent with existing code and also ensures that methods like
	Vm.findDriveByUUIDs() works correctly as this methods relies on VM
	metadata. Using the first approach would require additional fix also for
	linking metadata. Therefore the patch uses second options and replace
	hard coded "cdrom" in CD change related methods by hwclass.DISK.

	Bug-Url: https://bugzilla.redhat.com/1868643

2021-04-25  Benny Zlotnik  <bzlotnik@redhat.com>

	storage: implement Lease.fence
	This patch introduces the Lease.fence API which will be used to fence a
	job whose host has lost the lease and did not increase the generation.

	The fence will receive the lease metadata (job_id, job_status,
	genereation)

	If the generation is the starting generation of the job (usually 0), the
	fence operation will increase it by 1 and change the status of the job
	to FENCED, this way if the original host comes back to complete the
	operation it will fail both a generation mismatch validaton and the
	job status validation, as the expected status is PENDING whenver a host
	is about change the status of a job to a completion status (FAILED,
	SUCCEEDED)

	Bug-Url: https://bugzilla.redhat.com/1906074

	storage: implement Lease.status
	Lease.status already existed but did nothing. This patch implements the
	method to be utilized by jobs using lease metadata.

	It will be used to poll the job status when engine loses access to the
	host that originally started the job.

	The returned dict will be used to tell the status of the job.
	If the owners list empty, engine will assume the job is still running
	because the lease is still held.

	If the owners list is empty, the generation will be used to determine
	whether the job completed successfully or not.

	Bug-Url: https://bugzilla.redhat.com/1906074

	storage: implement CopyDataExternalEndpoint
	Implement the CopyDataExternalEndpoint to support copying to and from
	Managed Block devices (and possible many others).

	This patch also introduces the ExternalLease lock to be used for locking
	as the external equivalent of volume leases.

	Bug-Url: https://bugzilla.redhat.com/1906074

	storage: add LeaseMetadata to Lease.create
	Add a metadata field to lease creation. It will be used
	to store initial metadata on the lease's LVB.

	Public fields will be sent by the engine upon lease creation:
	generation, job_id and type.
	The rest of the fields can only be set internally in vdsm.

	Example for data present on LVB:
	{
	  'generation': 0,
	  'job_status': 'PENDING',
	  'job_id': '20e62244-19ad-46b7-afcc-a5fe2007a259',
	  'type': 'job',
	  'host_hardware_id': '6e11cad3-2214-4e1a-aadc-e050accfe3c0',
	  'created': 1614073237,
	  'modified': 1614073237
	}

	Bug-Url: https://bugzilla.redhat.com/1906074

2021-04-23  Roman Bednar  <rbednar@redhat.com>

	xlease: redesign index rebuilding
	Semantics of xlease index changed and the index slot on storage
	became source of truth instead of sanlock resource in the volume.
	The rebuild_index() implementation is thus no longer helpful.

	It has to be rewritten completely. For more details on new behavior
	see rebuild_index() docstring.

	Bug-Url: https://bugzilla.redhat.com/1902127

	xlease: remove usage of updating flag during lease add/remove
	Fixes the lease add, remove and lookup flows and related
	documentation. Also enables tests with the new behavior.

	For more details on the new flows see commit message of previous patch.

	Changes to add/remove/lookup flows change the semantics of index, the
	source of truth used to be the underlying storage format, now it's
	the index. That means the index now no longer acts only as a cache
	of the storage.

	For example: if we remove a lease the index is updated first and
	then sanlock can fail to clean up the resource on the storage. This is
	enough to free up resources but also it means that we have an entry
	on storage that is not valid and not present in the index buffer.
	Rebuilding the index at this point using rebuild_index() would lead to
	having orphaned leases in the index. This function needs to change in
	future patches to something more suitable - e.g. iterating the index and
	making sure all leases exists, adding them if needed.

	Adding a lease works in opposite way, we only update the index if
	sanlock succeeds to write a resource.

	Bug-Url: https://bugzilla.redhat.com/1902127

	tests: add coverage for new index flows
	When sanlock fails to write a resource when adding or removing a lease
	the lookup fails with LeaseUpdating error.

	Updating flag that we use in storage structure for a record in index
	slot is the root cause of the related bug. The solution is to change
	the flows so that the updating flag is not used. However removing it
	completely is problematic because it can still be stored on the
	underlying storage after updating vdsm to newer version.

	So in case one domain contains host A using an old version another
	host B using fixed version the presence of updating flag (set by
	host A) must not cause failures on either host.

	New flow has to be implemented for lease add, remove, and lookup
	actions.

	1) adding a lease
	If there already is a record and is updating it should be used
	and updating flag will reset. If no record is found for given lease id
	we create a new entry which defaults to updating=False.

	Also the order of actions needs to change as well:
	1. Write sanlock resource
	2. Update index

	This will ensure the index is updated only if sanlock resource write
	succeeded.

	2) removing a lease
	Removing the lease will unconditionally set the flag to False.

	Order of actions is reversed than for adding:
	1. Update index
	2. Write sanlock resource

	If the sanlock write fails we can just log the error instead of failing
	as writing an empty record is enough for removing a lease.

	3) lease lookup
	For lookup we raise NoSuchLease if updating flag is set to True.

	Bug-Url: https://bugzilla.redhat.com/1902127

	tests: test adding a lease fails when index is full
	Coverage report revealed missing test for a case when index runs out
	of space.

	The simplest test would be to keep adding leases using public method
	LeasesVolume.add() until the index fills up. However since we use
	temporary file backend to test leases volume this operation is slow.

	To speed up the test we can add a memory backend which will serve
	for testing purposes only and will be placed alongside the existing
	backends in xleases.py so it is not hidden.

	The memory backend is using io.BytesIO to simulate a file.

	Adding a new fixture that can be parametrized for tests that require
	non default values for memory backend (alignment, block_size, size).

	The actual test uses this fixture to create a backend with limited size
	so that only one lease can be added so next attempt will raise NoSpace.
	If the parameters are not used reasonable defaults are applied.

	Another limitation is xleases.format_index() which had hardcoded
	amount of records. Adding a parameter so this can be controlled.
	Without it the memory backend always gets rewritten and expanded to
	index max size silently because io.BytesIO is dynamically allocated
	rendering backend size parameter useless.

	Another approach would be creating a backend with zero size but that
	does not work because LeasesVolume checks for storage format and fails
	if the format is not met. Alternatively using sufficient size but with
	zeroes only results in the same failure.

2021-04-21  Nir Soffer  <nsoffer@redhat.com>

	tests: Fix typos in live merge tests

2021-04-21  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.60.5

2021-04-21  Shmuel Melamud  <smelamud@redhat.com>

	virt: Add error code for VIR_ERR_OPERATION_FAILED
	VIR_ERR_OPERATION_FAILED libvirt error, when occurs during VM migration
	in particular, is reported to the Engine as a generic 'migrateErr'. This
	patch adds a separate 'migOperationErr' code for this error and passes
	the accompanying message to the Engine, so it will be able to describe
	the situation to the user.

	Bug-Url: https://bugzilla.redhat.com/1717411

2021-04-21  Nir Soffer  <nsoffer@redhat.com>

	livemerge: Show extend number in all extend logs
	Log the attempt number in all extend logs:
	- When an extend completes successfully
	- When an extend completes after job was untracked
	- When an extend completes after job switched to COMMIT state
	- When an extend is running

	This will make it easier to understand what the system is doing when we
	troubleshoot extend issues.

	livemerge: Fix handling of last extend failure
	If we start 10 extends because of a timeout, and then one of the older
	extends fail, we would fail the entire merge, instead of waiting until
	the last extend attempt complete or times out.

	Now we pass the attempt number to the callback, so we can consider the
	real attempt number, and fail the entire merge only if the last attempt
	has failed.

	livemerge: Do not retry extend on errors
	The motivation for trying extend on errors was to recover faster from
	errors. However if an extend fails because of a temporary issue, like
	inaccessible storage, waiting before retrying may increase the chance to
	recover from the error.

	Retrying extend can also lead to extend storm, when extend request fail
	quickly and starts another extend attempt, which will fail again quickly
	and start the next attempt. We may use all retries in few seconds which
	is not useful.

	Fixed by removing the retry on extend errors. The next retry will start
	on the next timeout. This is also much simpler to reason about.

2021-04-21  Eyal Shenitzky  <eshenitz@redhat.com>

	backup.py: remove parent_checkpoint_id
	Since all the checkpoint are active on the VM there is
	no dependency between them. When a checkpoint is used
	to start a backup from that point in time, there is no
	need to link it to the previous checkpoint that was taken
	and chain them.

	Chaining the checkpoint is done by specifying the parent
	checkpoint ID, the element can be dropped now from the
	checkpoint XML and the parent_checkpoint_id has no other
	use and can be removed.

	This patch removes the parent_checkpoint_id from the
	checkpoint XML and the  backup config parameters with
	the validation related to that field.
	Also, fix the tests accordingly.

	Bug-Url: https://bugzilla.redhat.com/1950752

	backup.py: remove parent_checkpoint_id validation
	Libvirt support the options to redefine single checkpoint and not the
	entire checkpoints chain, this can be done because now all the
	checkpoints in the chain are active and not just the leaf checkpoint.

	Supporting single checkpoint redefinition means that there is no use
	for the parent checkpoint ID anymore and the validation for it is not
	needed.

	This patch removes the validation for the parent_checkpoint_id.

	Bug-Url: https://bugzilla.redhat.com/1950752

2021-04-21  Nir Soffer  <nsoffer@redhat.com>

	tests: Don't wait forever for cleanup thread
	Always use a timeout to avoid blocking all tests if cleanup thread is
	broken.

	tests: Fix flaky tests
	We wait until the cleanup thread call blockJobAbort(), and assumed that
	the cleanup thread finished right after that. Then we call
	vm.query_jobs() to start a new cleanup and verify the persisted job
	state. If thread did not finish, new cleanup is not started, and the
	test fails.

	Fix by using DriveMerger.wait_for_cleanup(), waiting until all cleanup
	threads finish with a timeout.

	livemerge: Add cleanup timeout
	Add optional timeout when waiting for cleanup. This will be useful for
	testing, when we need to synchronize with cleanup thread.

2021-04-21  Liran Rotenberg  <lrotenbe@redhat.com>

	snapshot: fix abort log
	When the snapshot operation is aborted it can be caused by VDSM
	internally when it times out, or by libvirt. Until now in both cases we
	logged the abort operation as if it was triggered by VDSM. Now we will
	consider libvirt abort as well.

	Bug-Url: https://bugzilla.redhat.com/1933669

2021-04-19  Milan Zamazal  <mzamazal@redhat.com>

	virt: Prevent logging VM external data in supervdsm.log
	VM external data is logged in supervdsm.log when logging supervdsm
	calls on debug level.  This data is sensitive and shouldn't be exposed
	in logs, let's protect it from logging in supervdsm calls.

	Bug-Url: https://bugzilla.redhat.com/1949146

2021-04-16  Ales Musil  <amusil@redhat.com>

	net: Fix gateway not being removed from non-default route network
	There was an oversight in implementation of old source route
	removal. Remove gateway also in cases when the gateway was
	on the network but now is missing.

	Bug-Url: https://bugzilla.redhat.com/1949995

2021-04-15  Vojtech Juranek  <vjuranek@redhat.com>

	spec: update qemu-kvm requirement
	Bump qemu-kvm version to fix deadlock in block_resize.

	Bug-Url: https://bugzilla.redhat.com/1948532

2021-04-15  Ritesh Chikatwar  <rchikatw@redhat.com>

	storageServer: Added validation for gluster volume type
	Added validation for gluster volume type for
	disperse volume. As disperse volumes are unsupported.

	Bug-Url: https://bugzilla.redhat.com/1940118

2021-04-14  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.60.4

2021-04-13  Nir Soffer  <nsoffer@redhat.com>

	livemerge: Fix cleanup if active commit job has gone
	If active commit block job is canceled outside of vdsm when a job is in
	CLEANUP state, for example using virsh, vdsm will be stuck in endless
	pivot retry loop.

	In the past vdsm could detect this case since it checked job live info
	on every call to queryBlockJobs(). Since we introduced job states, the
	check is done only in COMMIT state in _update_commit().

	Canceling job outside of vdsm should not happen, but this may be a good
	way to cancel jobs that vdsm cannot complete. For example if pivot fail
	because of libvirt or qemu bug[1]. If this happens, we want vdsm to
	detect the situation and abort the merge.

	Fix by checking if libvirt block job exists also in CLEANUP state. If
	the job has gone, we change job.pivot to False, and retry the cleanup.

	[1] https://bugzilla.redhat.com/1945635

	Bug-Url: https://bugzilla.redhat.com/1945675

	livemerge: Keep pivot state in job
	Previously we checked if a job is ready before retrying cleanup. The
	motivation was to handle unexpected case when libvirt job moves from
	ready state to non-ready state. In this case we create the cleanup
	thread with doPivot=False. This cannot work because the cleanup thread
	does not complete the block job if doPivot=False, and we may leave a
	running block job, which cannot be recovered.

	Fortunately, libvirt block job never moves from "ready" state after
	ready state was reported, so we never try to cleanup active commit with
	doPivot=False.

	Change to check if job is ready only in COMMIT state. When an active
	commit job is ready for pivot, we set job.pivot to True. The next
	cleanup attempt will try to pivot without checking libvirt state again.
	The pivot flag is persisted in the VM metadata, so after recovery we
	will try to pivot again.

	Bug-Url: https://bugzilla.redhat.com/1945675

	livemerge: Always retry failed pivot
	In commit bbb7e6f8ded52025308727dc3f29d514ea175bfc

	    vm: Stop live merge cleanup attempts for unrecoverable pivot error

	We changed error handling for unexpected libvirt errors when trying to
	pivot. Based on advice from libvirt folks, we assumed that unexpected
	error is not recoverable, and we aborted the merge job immediately.

	There are major issues with that commit:

	- The assumption that unexpected error is no recoverable is not always
	  correct, as we can see in bug[1]. In this bug, qemu block jobs state
	  is flipping between "ready" and "standby", and retrying the pivot
	  several times is very likely to complete successfully.

	- We aborted the merge job without aborting the libvirt block job! This
	  leave the merge in unrecoverable state; because the libvirt job is
	  still running, starting new merge will fail immediately. And because
	  the top volume is marked as ILLEGAL, the VM will not run after you
	  stop the VM.

	Fixing this issue requires 2 changes:

	1. Always retry cleanup after pivot errors
	2. Limit the number of retries and abort the libvirt block job.

	This change implement the first part, which is easy, fixes the
	regression, and should be enough to mitigate bug[1] in case it is not
	fixed soon in qemu.

	The second part of the fix is more complicated and will be done later,
	since we have more urgent work to do now.

	[1] https://bugzilla.redhat.com/1945635

	Bug-Url: https://bugzilla.redhat.com/1945675

	livemerge: Improve logging when looking up drive
	- If drive is not found, log the drive UUIDs instead of the drive name.
	- Unify wording in similar logs.

	livemerge: Log storage error when getting volume size
	The UnavailableStorage may include useful info about the error.

	vm: Cleanup logs during diskSizeExtend
	- Do not format logs, always pass arguments to logger
	- Include drive volumeID in logs related to volume size
	- Unify logs for checking requested size
	- Unify logs when libvirt blockResize fail
	- Unify logs when checking current disk size
	- Remove unneeded words in log messages

	vm: Log blockResize libvirt calls
	Every libvirt call should have debug log showing the arguments. This
	makes it easier to understand how libvirt was called when we need to
	debug libvirt issues.

	vm: Fix call to blockResize
	Use flags=value instead of treating it as positional argument. Current
	code will break if libvirt add another keyword argument before flags.

2021-04-13  Ales Musil  <amusil@redhat.com>

	net: Skip empty lines when parsing tc output
	Latest tc had tiny change in it's output format.

	tc on el8.3:
	"... \n \tindex 1 ref 1 bind 1\n \n\taction order 2 ..."

	tc on el8.4:
	"... \n\tindex 3 ref 1 bind 1\n\n\taction order 2 ..."

	The difference is in spaces between tabs and newlines.

	During parsing those outputs skip blank lines completely
	as they do not provide any useful information.

	Bug-Url: https://bugzilla.redhat.com/1949048

2021-04-13  Benny Zlotnik  <bzlotnik@redhat.com>

	storage: extract _next_generation helper
	Move _next_generation helper to storage/utils

	Bug-Url: https://bugzilla.redhat.com/1906074

	storage: move copy_data logger to module level
	This patch moves the the logger outside of the class to the module
	allowing other classes in this module to use it.

	Bug-Url: https://bugzilla.redhat.com/1906074

	managedvolume: move MBS in/appropriation to attach/detach
	Currently MBS device appropriation is done in clientIF before starting
	a VM. A better place for it would be attach_volume as it can be reused
	for other flows requiring appropriation such as copy_data.

	Same for MBS device inappropriation to detach_volume.

	The rule file looks like this for RBD devices:
	$ cat /etc/udev/rules.d/99-vdsm-managed_96d0....rules
	SYMLINK=="rbd/volumes/volume-96d0...", RUN+="/usr/bin/chown..."

	For iSCSI devices:
	$ cat /etc/udev/rules.d/99-vdsm-managed_967852b3-....rules
	SYMLINK=="mapper/360...", RUN+="/usr/bin/chown..."

	Bug-Url: https://bugzilla.redhat.com/1906074

	udev: add functions for managed devices
	This patch adds specific functions to handle udev operations for
	MBS devices
	* add_managed_udev_rule
	* trigger_managed_udev_rule
	* remove_managed_udev_rule

	We need separate functions as appropriateDevice expects a "thiefId"
	which is essentially a VM ID that owns the udev rule. For MBS devices,
	we do not have a VM as we run the rule before starting the VM or in
	flows where a VM is not involved like copy_data.

	Bug-Url: https://bugzilla.redhat.com/1906074

	virt: add "managed" metadata key to drives
	The "managed" key will indicate whether the drive is already managed
	(Managed Block Storage) or needs to be managed by vdsm.

	This will be used to avoid device appropriation when start a VM with
	managed disks attached, as they are appropriated before the VM started,
	when they are connected to host.

	Example of the how the metadata looks with this field:
	<ovirt-vm:device devtype="disk" name="sdb">
	    <ovirt-vm:GUID>3600a098038304479363f4c4870455167</ovirt-vm:GUID>
	    <ovirt-vm:imageID>3600a098038304479363f4c4870455167</ovirt-vm:imageID>
	    <ovirt-vm:managed type="bool">true</ovirt-vm:managed>
	</ovirt-vm:device>

	Bug-Url: https://bugzilla.redhat.com/1906074

2021-04-13  Tomáš Golembiovský  <tgolembi@redhat.com>

	virt: always report fake guest application info
	VDSM reports fake entries in guest application list for qemu-ga and for
	linux kernel. These were however reported only if oVirt guest agent was
	not running or if the application list did not contain entry for
	qemu-ga.

	Because the entry reported for the qemu-ga reported by oVirt guest agent
	does not contain version of qemu-ga this behavior is no longer favorable
	since Engine needs to know the version to be able to give indication to
	the user for updating tools. So instead, we now report the fake entries
	always.

	To avoid some duplicates in the output, entries "QEMU guest agent"
	(on Windows) and "kernel-<version>" (on Linux) produced by oVirt guest
	agent are removed.

	Bug-Url: https://bugzilla.redhat.com/1936298

2021-04-12  Michal Skrivanek  <michal.skrivanek@redhat.com>

	spec: update libvirt requirement
	fix libvirt regression in libvirt-guests service killing all running VMs

	Bug-Url: https://bugzilla.redhat.com/1946204
	Bug-Url: https://bugzilla.redhat.com/1821199
	Bug-Url: https://bugzilla.redhat.com/1940484

2021-04-12  Marcin Sobczyk  <msobczyk@redhat.com>

	coverage: Fix coverage collection
	We've had all the parts necessary to run vdsm with coverage
	enabled since a long time. The 'coverage' library was always
	specific, that it needs its registered 'atexit' handlers to run,
	for the coverage data dump to be successful. At some point in time
	the coverage setup stopped working. It's hard to tell when exactly,
	but my guess is it simply never worked on py3.

	After some analysis it turned out that vdsm's and supervdsm's
	custom SIGTERM handlers cause the registered 'atexit' functions
	not to be called.  This makes the coverage report work only
	for the spawned sub-processes, but not the main process.
	The effect is that a basic suite run reports ~35% coverage [2]
	on vdsm's codebase (with some obvious paths missing like most
	of the 'virt.vm' module). With this patch it's ~58% [3].

	Unfortunately there's no other way for the 'atexit' handlers
	to execute other that calling the private 'atexit._run_exitfuncs'
	function. That's why the code is called only when the coverage
	is enabled in the config.

	With this patch and some pending patches to OST [4] the vdsm coverage
	should work properly again soon.

	[1] https://coverage.readthedocs.io/en/coverage-5.4/subprocess.html?highlight=atexit#signal-handlers-and-atexit
	[2] https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/7782/artifact/exported-artifacts/coverage/html/index.html
	[3] https://jenkins.ovirt.org/job/ovirt-system-tests_manual/7784/artifact/exported-artifacts/coverage/html/index.html
	[4] https://gerrit.ovirt.org/#/q/topic:ost-vdsm-coverage+(status:open+OR+status:merged)

2021-04-08  Vojtech Juranek  <vjuranek@redhat.com>

	tests: add test for lvm.re_pvName
	This is a follow-up patch for fix in lvm.re_pvName reggular expression
	done in previous patch.

2021-04-08  Nir Soffer  <nsoffer@redhat.com>

	lvm: Disable wiping signatures
	When zeroing new logical volume, lvm looks for file system signatures on
	the logical volume and wipe them. For every signature found, lvm shows a
	confirmation, will fail non-interactive command by default. This
	behavior is new in RHEL 8.4, before that wiping signatures was disabled
	by default.

	In commit e9b806346fbe6d30bc4c82274b53328d17e07e82

	    lvm: Do not prompt when wiping signatures

	We added --yes to confirm wiping, however this is risky, since --yes is
	applied to all confirmations, even future confirmations that we may not
	want to confirm.

	For our use case, we don't need to wipe signatures. We control the VG
	and all LVs, and we know that they are safe to use regardless of
	previous content on the logical volume, left from another user of the
	LV. If wiping signatures is disable, LVM zero the 4k at the start of the
	device, which will destroy most signatures without confirmation.

	If users want to wipe volumes before using them, they can use "wipefs"
	or similar tools inside the guest.

	We don't want to change lvm default on the host, since they are safer
	for general use. Disabling wiping in lvcreate command invoked by vdsm.

	Bug-Url: https://bugzilla.redhat.com/1946199

	tests: Test lvcreate with leftover file system
	In RHEL 8.4 lvm enabled the --wipesignatures option by default. This
	detect signatures on new lvs and require manual confirmation before
	zeroing the start of the lv.

	Add a test reproducing the original issue and checking that we always
	zero the start of the lv.

	Bug-Url: https://bugzilla.redhat.com/1946199

2021-04-08  Ales Musil  <amusil@redhat.com>

	automation: Remove check-network stage
	The check-network stage was not usable
	on ovirt jenkins without container backend.

2021-04-07  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.60.3

2021-04-07  Nir Soffer  <nsoffer@redhat.com>

	lvm: Do not prompt when wiping signatures
	LVM changed the behavior when wiping signatures during lvcreate, and now
	it requires manual confirmation. This breaks lvcreate calls in vdsm if
	the lv has lefover from previous usage of the lv.

	Add --yes to lvcreate to avoid the confirmation.

	More work may be needed to updated the tests. I post this as to check if
	this unblocks automated tests.

	Bug-Url: https://bugzilla.redhat.com/1946199
	Related-to: https://bugzilla.redhat.com/1894692

2021-04-07  Vojtech Juranek  <vjuranek@redhat.com>

	test: improve formatting of Config.__repr__()

2021-04-07  Nir Soffer  <nsoffer@redhat.com>

	tests: Mark 4k loopback test as xfail
	This test used to fail randomly on oVirt CI, but now it fails randomly
	also on Travis. Mark it as xfail on both oVirt and Travis CI.

2021-04-07  Dan Kenigsberg  <danken@redhat.com>

	localfssd_test: __repr__ should not return None
	reported by https://sonarcloud.io/project/issues?id=oVirt_vdsm&open=AXiMvBrJnkb2YL4ArADa&resolved=false&types=BUG

2021-04-05  Vojtech Juranek  <vjuranek@redhat.com>

	storage: use raw string literals for regular expressions
	There is lots of warning in the tests like this:

	    vdsm/lib/vdsm/storage/blockSD.py:189: DeprecationWarning: invalid escape sequence \d
	    LVM_ENC_ESCAPE = re.compile("&(\d+)&")

	Use raw string literals for regular expressions to avoid these warning
	and make test log more easy to read.

	storage: remove backslashes from comments
	Comments with backslahes are one of the source of warning about invalid
	escape sequence. Remove them from comments.

	tests: use raw string literals for regular expressions
	There are test warnings like this:

	   vdsm/tests/virt/libvirtnetwork_test.py:48: DeprecationWarning: invalid escape sequence \s
	   a_xml_normalized = re.sub(b'>\s*\n\s*<', b'><', a_xml).strip()

	Use raw string literals for regular expressions to avoid these warning
	and make test log more easy to read.

2021-04-01  Vojtech Juranek  <vjuranek@redhat.com>

	storage: use warning instead of warn
	Log.warn() is deprecated and Log.warning() should be used instead.
	Relpace log.warn() with log.warning().

	This also reduce number of warnings when running the tests and makes
	test log more easy to read.

2021-04-01  Nir Soffer  <nsoffer@redhat.com>

	livemerge: Retry extend after errors or timeout
	Change the EXTEND phase to retry the operation after timeouts or errors.
	This is similar to the drive monitor extend mechanism, but using larger
	timeout and limited number of retries.

	Extend takes typically 2-6 seconds, but can be much slower when a system
	is overloaded. Use 10 seconds extend timeout, to avoid sending too many
	extend requests. Practically, because we don't use a timer but poll jobs
	every 15 seconds, the actual timeout will be 10-25 seconds.

	Use number of retries instead of a time limit when aborting extend. This
	should give more time when a system is overloaded, since in this case we
	are likely to be blocked on storage, and query jobs less often.

	Before starting the first extend, we store the base and top volume sizes
	in the extend info dict. Storing the base size ensure that we don't
	extend the base volume more than needed, in case the volume was extended
	right before we retry.

	When retrying extend, we update the top volume size before starting the
	next extend, since the top volume may be extended automatically while
	waiting for base volume extend completion, and we may need larger extend
	at the time of the retry.

	Here is an example extend metadata during extend:

	{
	    "attempt": 1,
	    "base_size": 1073741824,
	    "top_size": 3221225472,
	    "started": 15051.875996275
	}

	Here are example extend logs taken from new tests:

	INFO    (MainThread) [virt.livemerge] Starting extend 1/10 for
	job=5c0a3978-d3c0-4acb-84ee-a52bf0088878 drive=sda
	volume=75658c40-205d-48f0-87b3-b21530097d76 (livemerge:458)

	WARNING (MainThread) [virt.livemerge] Extend 1/10 timeout for job
	5c0a3978-d3c0-4acb-84ee-a52bf0088878, retrying (livemerge:651)

	INFO    (MainThread) [virt.livemerge] Starting extend 2/10 for
	job=5c0a3978-d3c0-4acb-84ee-a52bf0088878 drive=sda
	volume=75658c40-205d-48f0-87b3-b21530097d76 (livemerge:458)

	...

	INFO    (MainThread) [virt.livemerge] Starting extend 10/10 for
	job=5c0a3978-d3c0-4acb-84ee-a52bf0088878 drive=sda
	volume=75658c40-205d-48f0-87b3-b21530097d76 (livemerge:458)

	ERROR   (MainThread) [virt.livemerge] Extend 10/10 timeout for job
	5c0a3978-d3c0-4acb-84ee-a52bf0088878, untracking job (livemerge:656)

	I tested this using a stress test[1] for deleting snapshot under extreme
	load. Here is the distribution of attempts during the tests:

	-------------------------------
	attempts    extends    fraction
	-------------------------------
	       1        674       67.5%
	       2        173       17.3%
	       3         77        7.7%
	       4         40        4.0%
	       5         16        1.6%
	       6         12        1.2%
	       7          3        0.3%
	       8          2        0.2%
	--------------------------------

	[1] https://gitlab.com/nirs/ovirt-stress/-/tree/master/delete-snapshot

	tests: Consume extend requests
	When simulating volume extension, pop the extend request from the
	requests list. The extend callback can be called only once, so leaving
	the request in the list can cause trouble later.

	vm: Make getVolumeSize() public
	This helper is needed in live merge to implement extend retry.

	tests: Use fake_time.time
	When advancing the fake clock, we will modify its time attribute. Use
	the time attribute also for getting the current time for consistency.

	live-merge: Prepare for extend retries
	When extend does not complete, we need to retry the operation. To
	implement this we need to keep more info about the extend. Replace
	"extend_started" with "extend" dict. This make it easy to add more
	fields as needed, minimize the size of the metadata when we are not in
	EXTEND state, and simplify the code to manage the metadata.

	Here is an example metadata with this change:

	{
	    "5c0a3978-d3c0-4acb-84ee-a52bf0088878": {
	        "bandwidth": 0,
	        "baseVolume": "2939852e-187c-48b4-b57a-1f8ea5bc94f8",
	        "disk": {
	            "poolID": "84fab540-fbf2-11ea-a568-5254002fb5cc",
	            "domainID": "c0b55558-5038-4a2b-bc96-47aff28d0218",
	            "imageID": "222d9718-f2d9-4b95-bd7f-e8dfe565cf56",
	            "volumeID": "6c0f1c64-90ac-4a68-839d-5a47af1d3890"
	        },
	        "drive": "sda",
	        "extend": {
	            "started": 15051.875996275
	        },
	        "jobID": "5c0a3978-d3c0-4acb-84ee-a52bf0088878",
	        "state": "EXTEND",
	        "topVolume": "6c0f1c64-90ac-4a68-839d-5a47af1d3890"
	    }
	}

2021-04-01  Ales Musil  <amusil@redhat.com>

	automation: Clarify jenkins runner for all jobs
	Now that we can specify which jenkins should run the
	job, add this parameter to all jobs to make clear
	where is it running.

2021-03-30  Vojtech Juranek  <vjuranek@redhat.com>

	pytest: use --strict-markers option
	Currently we use `--strict` option which is deprecated since
	version 6.2. Pytest prints warning about it:

	    The --strict option is deprecated, use --strict-markers instead.

	Use `--strict-markers` instead of `--strict`.

2021-03-30  Ales Musil  <amusil@redhat.com>

	hooks: Restart fcoe and lldpad services only if needed
	The fcoe hook alwyas restarted those services even if
	there was not any fcoe related change. Add check if
	the service restart is really needed.

	Bug-Url: https://bugzilla.redhat.com/1940569

2021-03-29  Tomáš Golembiovský  <tgolembi@redhat.com>

	virt: optionally forbid reading of empty external data
	An option was added to allow/forbid reading empty data (zero-sized file
	or empty directory). In case the requirement is broken
	ExternalDataFailure exception is raised. This can slightly mitigate
	situations where data are damaged (truncated) due to host crashes as it
	prevents sending the empty data to Engine.

	The default (affecting both NVRAM and TPM) is that empty data are not
	allowed.

	Bug-Url: https://bugzilla.redhat.com/1943141

2021-03-24  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.60.2

2021-03-24  Tomáš Golembiovský  <tgolembi@redhat.com>

	qga: use libvirt events to track channel state
	We can use events from libvirt to track the state of qemu-ga channel.
	Libvirt reports when the agent connects to/disconnects from the virtio
	channel. This is pretty useful because we're not in complete darkness
	anymore and gives us at least a clue if the agent is there or not.
	Obviously the agent can still fail to respond to our commands either
	because it is stuck or because it is performing another command, but
	most of our issues stemmed from the fact we didn't have a clue if agent
	is there and were blindly trying to run commands (hoping for success).

	This also fixes an issue that we were having ever since we switched to
	libvirt interface. For VMs without the guest agent libvirt was spamming
	journal with messages about agent being unavailable.

	virt: retrieve state of channels
	It is possible to get state of a channel (connected/disconnected) from
	domain XML. This gives us an information if there is any process
	attached on the guest side. It is not yet needed for anything but it
	will be used in a followup patch.

	docs: update qemu-ga documentation concerning libvirt interface

2021-03-24  Michal Skrivanek  <michal.skrivanek@redhat.com>

	spec: update kernel dependency
	required for live migration between 8.2 and 8.3 hosts after Intel's removal
	of TSX from Skylake and Cascadelake CPUs

	Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=1907973

2021-03-24  Ales Musil  <amusil@redhat.com>

	net: Properly remove old source routes
	There were some unnoticed cases in which outdated
	source routes were not removed. This was uncovered by
	testing vdsm with nmstate 1.0. To prevent this issue
	add method to remove source route when it is relevant.

	The most important flows are switch from static to
	dynamic or vice versa, removal of network (kernel has
	removed routes but not rules), change of gateway.

	At the same time optimize addition of source routes.

2021-03-23  Eyal Shenitzky  <eshenitz@redhat.com>

	backup.py: add bitmap name for disks when redefining a checkpoint
	Libvirt changed the default behavior after dropping the need to
	redefine with the VM domain XML. Before that, when a checkpoint
	redefined, if the bitmap name wasn't mentioned for the disk it was
	taken from the domaincheckpoint name.

	Now, Libvirt changed that behavior so we should provide the bitmap
	name for each disk in the backup.

	Bug-Url: https://bugzilla.redhat.com/1941593

2021-03-22  Benny Zlotnik  <bzlotnik@redhat.com>

	sd: add external lease methods
	- acquire_external_lease
	- release_external_lease
	- set_lvb
	- get_lvb

	Bug-Url: https://bugzilla.redhat.com/1906074

	api: introduce CopyDataExternalEndpoint
	Add CopyDataExternalEndpoint to allow copying to and from managed block
	volumes

	Bug-Url: https://bugzilla.redhat.com/1906074

	clusterlock: add set_lvb and get_lvb
	This patch introduces the methods set_lvb and get_lvb to support writing
	and reading data from LVB. These methods will use the bindings
	introduced in sanlock[1][2]

	[1] https://pagure.io/sanlock/c/2ea4446a06079a71266fd9f5066dd2909c7546d6
	[2] https://pagure.io/sanlock/c/9034b7b9c7bae930c57de9d96dd8280343baf5f1

	Bug-Url: https://bugzilla.redhat.com/1906074

2021-03-22  Nir Soffer  <nsoffer@redhat.com>

	livemerge: Fix wrong name
	Vm.refreshDriveVolume() was renamed recently to
	Vm.refresh_drive_volume(). When the live merge code moved to livemerge
	module in:

	commit a8c3d80af92c2faa0713603d3d1eef2e5418249b

	    vm: Move live merge code to livemerge module

	The old name was used by mistake. Unfortunately this flow is not covered
	by the tests since it is hard to create a test vm with this
	configuration, and the edge case of deleting snapshot on a raw volume
	after extending the disk was not tested.

	Bug-Url: https://bugzilla.redhat.com/1941311
	Reported-by: Evelina Shames <eshames@redhat.com>

2021-03-18  Tomáš Golembiovský  <tgolembi@redhat.com>

	qga: don't break reset_failure() on unknown ID

	tests: fix test_all_channels_extra_domain() in vmxml
	The test (added in commit b8d50777) is broken and it looks like it never
	really worked. It is comparing for non-equality of list of strings
	(channel names) and list of tuples (result of
	DomainDescriptor.all_channels()). It seems that the original intent was
	to make sure that all_channels() really returns all channels properly
	and not just guest agent channels. So let's fix it this way.

	At the time the test was added the function all_channels() also listed a
	spice channel. But since commit 2a786fb this is no longer true and spice
	channel is not returned. To remedy the situation new channel is added to
	test domain XML.

	virt: move vmchannels.AGENT_DEVICE_NAMES to tests
	The constant AGENT_DEVICE_NAMES is no longer used in the code and
	remains only in tests. Let's move it to tests where it belongs.

2021-03-18  Ales Musil  <amusil@redhat.com>

	virt, net: Handle failover devices with duplicate MAC
	Store teaming flag for interfaces with teaming attribute.
	Because those interface pairs have duplicate MAC we need
	to match them also with alias to ensure they are uniquely
	identified and updated.

2021-03-17  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.60.1

2021-03-16  Nir Soffer  <nsoffer@redhat.com>

	livemerge: Increase extend timeout
	Extending a volume takes usually 2-6 seconds. But when a host is
	overloaded, it can be much slower.

	I tested extreme case, vm with 16 disks, when each disk had 2 GiB of
	data to commit during the merge. 5 disks failed the merge with extend
	timeout:

	2021-03-10 01:11:46,920+0200 ERROR (periodic/3) [virt.livemerge] Extend
	timeout for job a4bc1fa7-5e8d-49fd-9846-ec1c6db04d73, untracking job
	(livemerge:603)

	The extend finished successfully few seconds later:

	2021-03-10 01:11:48,605+0200 INFO  (mailbox-hsm/1) [virt.vm]
	(vmId='9ee1b9c9-90f6-48db-8c1d-3a97302ec15d') Extend volume
	29b3702c-3066-46e3-95b9-9cbf3f05e1ae completed <Clock(total=34.39,
	extend-volume=17.16, refresh-volume=8.90)> (vm:1475)

	The actual extend that happens on the SPM took only 17.1 seconds, but
	refreshing the volume on the host trying to merge 16 disks at the same
	time took 8.9 seconds. Additional 8.3 seconds spent around these
	operations, probably because the host was overloaded.

	I run this test on a vm on my laptop, with storage served by another vm
	using the laptop builtin SSD. Real storage can be 10 times faster, so
	this is less likey to happen on a real system.

	I got data from real user environment when we experience delete snapshot
	failures. Based on vdsm logs from this environment, maximum extend time
	was 10 seconds.

	Based on this, I think it will be safe to use larger default timeout.
	Increased the timeout to 60 seconds.

	If extend fails, there is no change on storage, but engine marks the
	disk as illegal. Trying to merge again will succeed.

2021-03-16  Tomáš Golembiovský  <tgolembi@redhat.com>

	virt: add flag for enabling/disabling ballooning
	Even though we want to prevent certain VMs from ballooning the memory,
	we still want to keep the balloon device in such VMs because that's how
	we get memory statistics from the guest. So instead of removing balloon
	device (and losing statistics) it makes sense to carry a flag whether
	ballooning should be enabled or disabled. We read the initial value from
	metadata key 'ballooningEnabled' (or default to True if not present) and
	pass it over to MOM. We also ignore any balloon target changes for VMs
	that have ballooning disabled.

	Bug-Url: https://bugzilla.redhat.com/1917718

2021-03-08  Martin Perina  <mperina@redhat.com>

	core: Add support for cluster level 4.6
	This patch add support for cluster compatibility level 4.6. VDSM will
	report support for 4.6 level only when it's running on CentOS/RHEL 8.4
	with Advanced Virtualization 8.4 (libvirt >= 7.0.0).

	Bug-Url: https://bugzilla.redhat.com/1933974

2021-03-05  Nir Soffer  <nsoffer@redhat.com>

	livemerge: Skip pointless extend
	If base volume is already extended to maximum size we sent a pointless
	extend request to the SPM. Normally this would only delay the commit by
	few seconds, but if the request fail or time out the merge will fail.

	Now we skip EXTEND in this case and start the COMMIT.

	tests: Fix fake getVolumeInfo(), getVolumeSize()
	The real API returns sizes as strings. This is leftover from the time
	vdsm had xmlrpc API, which does not support big integers. Unfortunately
	this was "fixed" internally by using strings, and when xmlrpc was
	replaced by jsonrpc, the bad return value was not fixed.

	Change getVolumeInfo() and getVolumeSize() to behave like the real API,
	so the test will fail if code is using sizes as integer.

	tests: Extract simulate_volume_extension()
	Replace duplicate code for simulating volume extension with a helper
	function.

	tests: Fix fake blockInfo()
	FakeDomain.blockInfo() was returning constant value for every drive.
	This breaks new volume size calculation when extending the drive. The
	issue was hidden since we did not verify the new requested volume size.

	Change FakeDomain to keep dict of dive block info, so we can simulate
	volume extension properly when simulating extend flow. Change RunningVM
	to initialize the drive info in the fake domain.

	Test new requested size in active merge test, and fix other tests to
	update also libvirt block info after successful base volume extension.

2021-03-05  Tomáš Golembiovský  <tgolembi@redhat.com>

	virt: filedata: use long options for tar, add --verbose
	For clarity short option names were replaced by long names.

	While at it, adding --verbose option to get the list of files. The
	behavior is slightly different for create and for extract though. When
	extracting tar the file list goes to stdout and thus is only logged in
	case of errors. On the other hand, when creating a tar, the file list
	goes to stderr (because compressed data go to stdout) and thus is logged
	always.

2021-03-05  Jean-Louis Dupond  <jean-louis@dupond.be>

	virt: Add VM Reset call
	Add a way to reset a VM next to reboot/shutdown.
	This as you sometimes want to quickly reset a VM that has crashed/is OOM
	for example. The reset call is much quicker than a shutdown/start cycle.

	When we reset the VM we set the status to:
	vmstatus.REBOOT_IN_PROGRESS

	Libvirt will also send a VIR_DOMAIN_EVENT_ID_REBOOT, and this is already
	handled within VDSM.

	Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=1927718

2021-03-04  Tomáš Golembiovský  <tgolembi@redhat.com>

	virt: filedata: sort entries in tar archive
	Normally there are no guarantees on file ordering in the archive. This
	can make (depending on underlying file system) tar content unstable
	between invocations which can confuse our hash-based comparison for
	external data handling. It also breaks tests because when we do tar --
	untar -- tar operation the initial content can differ from final content
	even when nothing changed on disk.

	To remedy the situation we run tar with '--sort=name' argument to sort
	entries by directory/file name. The --sort argument was added in tar
	1.28 and is available from EL 8.0.

2021-03-03  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.50.8

2021-03-03  Vojtech Juranek  <vjuranek@redhat.com>

	storage: ensure domain is attach before detach
	Previous patch assumes that storage domain was put into maintenance
	before it's detach. Add assert which validates this assumption.

	Bug-Url: https://bugzilla.redhat.com/1928041

	storage: deactivate special LVs after detach
	On block SD, LVs and whole VG is deactivated when domain is put into
	maintenance. However, during SD detach, special LVs are activated again,
	as a result of creating BlockStorageDomain object. This cause LVs are
	activated again, but never properly deactivated. After SD detach, we are
	left with stale DM links, which can cause subsequent issues, e.g.
	multipath device cannot be removed.

	Proper solution would be to refactor BlockStorageDomain constructor not
	to activate special LVs in constructor and activate them when needed.
	This would require lots of changes with high risk to instorduce more
	issues than issues it solves.

	Deativate special LVs after detach. With this fix, multipath device can
	be removed after removing storage domain. Multipath device is present
	only in case SD refresh happens beforer LUN is removed from storage
	server. If no other storage domain is on given storage server, no
	multipath device for removed SD won't appear even after SD refresh,
	as host will log out from storage server.

	Multipath device can be removed with

	multipath -f <WWID>
	echo 1 > /sys/block/<device-name>/device/delete

	which is proper way how to remove multipath block device.

	Add context manager, which tears down storage domain upon exiting code
	block and use it during SD detach.

	Bug-Url: https://bugzilla.redhat.com/1928041

	storage: add validation that SD is attached
	Add helper method which validates that storage domain is in attached
	state. It will be used in follow-up patch.

	storage: rename StoragePool.validateAttachedDomain()

	storage: rename StoragePool.validatePoolSD()

2021-03-02  Eyal Shenitzky  <eshenitz@redhat.com>

	backup.py: skip disks validation when redefining a checkpoint
	Checkpoint redefinition uses the backup configuration in order
	to generate the checkpoint XML for libvirt. When creating the backup
	configuration there is a validation for having disks in the backup
	since there is no reason to start a backup without disks.

	For checkpoint redefinition, there might be a case when there is a
	checkpoint that should be redefined but it doesn't contain disks, for
	example when the backed-up disks removed/detached from the VM.

	The validation for the existence of the disks is now skipped in case of
	using the backup configuration to redefine a checkpoint, also excluding
	the <disks> element from the checkpoint XML.

	Bug-Url: https://bugzilla.redhat.com/1925099

2021-02-24  Nir Soffer  <nsoffer@redhat.com>

	tests: Fix flags handling
	We stored VIR_DOMAIN_BLOCK_JOB_TYPE_ACTIVE_COMMIT for block job,
	instead of VIR_DOMAIN_BLOCK_JOB_TYPE_COMMIT. The current code does not
	check the type, so this issue was hidden.

	When checking flags in blockJobAbort, we use == instead of testing the
	flag bit. This works since this is the only flag used, but wrong.

	Also update the test to verify job type.

	tests: Test job persistence in all tests
	Verify that job is persisted or unpersisted in internal merge and in
	negative flows tests.

	The new checks reveal that fake domain metadata was not initialized
	correctly, real metadata always have jobs element with empty json
	object.

	tests: Fix variable name, improve docs
	FakeDomain.metadata was created by mistake as _metadata. Group together
	the "fake" variables added for the tests which are not part of virDomain
	interface.

	tests: Replace dict.update() with assignment
	Using

	    d.update({key: value})

	is a convoluted way to say:

	    d[key] = value

	tests: Simplify and document Config class
	The helper for reading yaml were not helpful, only added unneeded state
	to the class.

	Documenting the structure of the test data directory to make it more
	clear to new contributors.

	tests: Rename Config.config to Config.values
	It is very confusing to work with config.config. It makes sense because
	the file was named "config.xml". Lets make it more clear by renaming the
	file to values.xml and the variable to values.

	tests: Refer to libvirt block job as block_job
	When we start a merge job we have:
	- job - the vdsm merge job
	- block_job - the libvirt blockCommit block job, started when vdsm job
	  switches to COMMIT state
	- persisted_job - the job info persisted to the vm xml on state changes
	- job_info - information about the vdsm job, presented to engine

	This change rename job to block_job when we refer the libvirt block job.

	tests: Use image id for prepared volumes
	Recently we change some IRS methods to include image id when keeping
	prepared volumes, but not all methods were updated.

	Unify the way we keep prepared volumes so all methods use (sd_id,
	img_id, vol_id) as the key. Update the tests using prepared
	volumes to use the new key.

	tests: Move prepared volume setup to RunningVM
	Instead of repeating the setup in every test, do in in RunningVm
	__init__.

	tests: Unify the way we define test variables
	Define all tests variables at the top of the test, in a separate block.
	This code is repeated in many tests and it makes it easier to understand
	the tests if they have common structure.

	tests: Rename image_id to img_id
	This is the idiom used in modern vdsm code, matching sd_id, vol_id.

	tests: Keep file extension in read_files
	Keeping the extension in the dict helps to understand the dict when
	using it in the tests, and helps to find code that refer to certain data
	file. For example if you see "00-before.xml" in the code, you can look
	for the file under tests/data/.

	tests: Fix creation of domain descriptor
	Since our xml comes from libvirt, we don't need to use
	XmlSource.INITIAL.

	vm: Fix calls to XMLDesc
	Like setMetadata(), XMLDesc() was called incorrectly and the fake
	implementations were wrong.

	vm: Fix calls to blockCommit and blockJobAbort
	Like setMetadata(), blockCommit and blockJobAbort were called
	incorrectly and the fake implementation was wrong.

	vm: Fix blockInfo calls
	Like setMetadata(), blockInfo() was called incorrectly and the fake
	implementation were wrong.

	vm: Call setMetadata() and metadata() correctly
	The flag argument is keyward argument, can seen in libvirt.py:

	    def metadata(self, type, uri, flags=0):

	    def setMetadata(self, type, metadata, key, uri, flags=0):

	But the fake implementation was using:

	    def metadata(self, type, uri, flags):

	    def setMetadata(self, type, metadata, key, uri, flags):

	So all code calling these methods was calling the function in the wrong
	way. This tends to work since libvirt is unlikely to add new variables
	or change the order, but it is bad practice.

	Fix the wrong fakes and update the code to use the right way.

	livemerge: Fix call to blockJobInfo()
	The flags argument is keyword argument. Calling it as positional
	argument is wrong. This was needed in the past because the fake
	blockJobInfo() was defined incorrectly.

2021-02-24  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.50.7

2021-02-23  Nir Soffer  <nsoffer@redhat.com>

	livemerge: Fix job persistence
	In the past persisting the jobs when adding or removing a job was
	enough, since job metadata did not contain any state. Now that job keeps
	internal state (EXTEND, COMMIT, CLEANUP) we must persist the job when we
	change the state.

	Persisting is tricky since vdsm may be killed at any point, and we want
	to be able to recover. We always persist right after changing the job
	state but before starting the actual operation. This way if vdsm is
	killed, we should be able to recover.

	For INIT state, the job is not persisted, so if vdsm is killed the job
	does not exists and nothing has to be done.

	For EXTEND state, if vdsm is killed before extend was started or
	completed, we will treat the extend as a timeout and fail the job. With
	more work we would be able to check if extend completed and start the
	commit, or retry the extend.

	For COMMIT state, if vdsm is killed before starting the commit we will
	detect that the job does not exist in libvirt, and cleanup the job
	without making any changes. With more work we can start the commit.

	For CLEANUP state, if vdsm is killed before we started the cleanup, we
	will detect that no cleanup thread exist, and start a new cleanup.

	The tests use now parse_jobs(vm) helper to verify that jobs were
	persisted with right state.

	Bug-Url: https://bugzilla.redhat.com/1796415
	Bug-Url: https://bugzilla.redhat.com/1796124

	livemerge: Abort job if extend timed out
	Abort the job if extending base volume did not complete in 30 seconds.
	Engine will detect that job has gone, and the top volume was not removed
	from the chain.

	Here is an example flow from the new test_extend_timeout:

	[virt.livemerge] Job 5c0a3978-d3c0-4acb-84ee-a52bf0088878 switching state from INIT to EXTEND
	(livemerge:162)

	[virt.livemerge] Starting extend for job=5c0a3978-d3c0-4acb-84ee-a52bf0088878 drive=sda
	volume=75658c40-205d-48f0-87b3-b21530097d76 (livemerge:419)

	[virt.livemerge] Extend timeout for job 5c0a3978-d3c0-4acb-84ee-a52bf0088878, untracking job
	(livemerge:563)

	We will add a more robust recovery mechanism later.

	Bug-Url: https://bugzilla.redhat.com/1796415
	Bug-Url: https://bugzilla.redhat.com/1796124

	livemerge: Extend before commit
	Prevent commit failure with ENOSPC when commit full base volume before
	it was extended.

	Job has now 2 new states:

	- INIT: Job was created but is not tracked yet. This state is typically
	  very short, while caller is blocked in merge().

	- EXTEND: Extending base volume before commit. Libvirt does not know
	  about this job yet. query_jobs() will report the job without live info
	  (cur=0, end=0) to make engine monitoring happy. When extend completes,
	  the job switch to COMMIT state.

	Here is example flow from test_active_merge:

	[virt.livemerge] Job 5c0a3978-d3c0-4acb-84ee-a52bf0088878 switching state from INIT to EXTEND
	(livemerge:159)

	[virt.livemerge] Starting extend for job=5c0a3978-d3c0-4acb-84ee-a52bf0088878 drive=sda
	volume=75658c40-205d-48f0-87b3-b21530097d76 (livemerge:414)

	[root] Extend volume 75658c40-205d-48f0-87b3-b21530097d76 completed <Clock(total=0.00,
	extend-volume=0.00, refresh-volume=0.00)> (vm:1470)

	[virt.livemerge] Extend completed for job 5c0a3978-d3c0-4acb-84ee-a52bf0088878, starting commit
	(livemerge:451)

	[virt.livemerge] Job 5c0a3978-d3c0-4acb-84ee-a52bf0088878 switching state from EXTEND to COMMIT
	(livemerge:159)

	[virt.livemerge] Starting merge with job_id='5c0a3978-d3c0-4acb-84ee-a52bf0088878', original
	chain=75658c40-205d-48f0-87b3-b21530097d76 < 2939852e-187c-48b4-b57a-1f8ea5bc94f8 <
	6c0f1c64-90ac-4a68-839d-5a47af1d3890 (top), disk='sda', base='sda[3]', top=None, bandwidth=0,
	flags=12 (livemerge:377)

	Error handling need more work:

	- If extend never complete (e.g. bug in mailbox or SPM), the job will
	  get stuck forever.

	- If extend fails, the job is aborted. Before this change the job could
	  complete even if extend failed.

	Based on Amit patch:
	https://gerrit.ovirt.org/c/vdsm/+/112341

	Bug-Url: https://bugzilla.redhat.com/1796415
	Bug-Url: https://bugzilla.redhat.com/1796124

	vm: Add drive extend callback
	Add a callback argument to extendDriveVolume(). If the callback is set,
	it will be called when the extend completes. If the extend failed the
	callback will be called with the error.

	We will use this callback to start commit after the base volume was
	extended.

	We don't have any tests for this code, but this code is invoked from
	live merge tests. The normal flow when callback is not set is tested now
	by current livemerge tests. The flow with a callback will be tested once
	we start using the callback.

	Changed based on Amit patch:
	https://gerrit.ovirt.org/c/vdsm/+/112835

	livemerge: Improve error handling if updating a job fails
	Previously if we had multiple tracked jobs, for example when deleting
	snapshots with multiple disks, and updating one job raised we aborted
	the update. Now failure to update a job will log an exception and
	continue to update the rest of the jobs.

	I did not have this issue, but looking at the simple updated loop in
	query_jobs(), it was very clear that it is missing error handling.

	livemerge: Extract _update_commit() and _update_cleanup()
	Since the code for updating jobs is completely separated for every
	state, there is no need to keep _update_jobs. Extract a method for
	updating single job for every state, and inline _update_jobs() into
	query_jobs().

	Based on Amit patch:
	https://gerrit.ovirt.org/c/vdsm/+/112341

	livemerge: Move logging and state change to _start_cleanup()
	We repeated the "Starting cleanup ..." log 4 times. Logging in
	_start_cleaup() make it simpler.

	The log about retrying cleanup is not really needed since we already log
	the error at the end of the failing cleanup.

	We set the job state to CLEANUP twice. Is it nicer if the state is
	change in _start_cleanup().

	livemerge: Rename _start_cleanup_thread() to _start_cleanup()
	We do cleanup in a cleanup thread, but _start_cleanup() is more
	consistent with _start_commit() and _start_extend().

	livemerge: Simplify _update_jobs()
	Now that we don't have have the fragile drive lookup, we can check for
	completed jobs in a uniform way as part of CLEANUP state branch.

	livemerge: Eliminate fragile drive lookup
	We use to lookup the drive object for every job in every _update_jobs()
	iteration. Since drive lookup uses the job disk PDIV, it is expected to
	fail after a pivot, when the drive volumeID changes from the top volume
	to the base volume. We have a fallback to search the base volume that
	probably works in runtime, but fail in the tests.

	Eliminate this issue and simplify the code by moving drive lookup to
	_start_cleanup_thread(). This always happens *before* pivot, so the
	drive lookup failure is not possible. Since the lookup is simple now,
	remove _lookup_drive().

	livemerge: Use job.drive instead of drive
	_active_commit_ready() and blockJobInfo() do not use the drive object,
	only the drive name, stored in the job. Removing drive usage will allow
	future refactoring.

	livemerge: Rename cleanThread to cleanup
	Looks little nicer and avoids breaking long lines.

	livemerge: Cleanup logs and comments in _update_jobs()

	livemerge: Introduce Job.state
	Add Job.state tracking these states:

	- COMMIT: The libvirt block commit job was started. This is the starting
	  state when we start tracking a job.

	- CLEANUP: Cleanup phase has started after the libvirt block commit job
	  has gone, or reported as ready for pivot.

	The new "state" variable is replacing the "gone" variable. If the
	libvirt job is gone, we switch to COMMIT state and cleanup the job
	without a pivot.

	For active merge, we switch to CLEANUP state when libvirt reports that
	the job is ready for pivot. We need to check if the libvirt job is ready
	before starting a cleanup thread.

	A nice side effect of this change is fixing a bug when reporting default
	live info (cur=0, end=0) after successful pivot. Previous code tried to
	get block job info and got empty dict, so default live info was
	reported. Now we don't query libvirt job info after moving to CLEANUP
	state so we continue to report the last live info.

2021-02-23  Vojtech Juranek  <vjuranek@redhat.com>

	storage: introduce context manager for special LVs
	Add context manager for special LVs which activates special LVs before
	usage and deactivate them on exit. Use it in the relevant flow when SD
	is created. Rest of the cases should be handled by SD lifecycle methods.

	Bug-Url: https://bugzilla.redhat.com/1928041

	storage: move validation of block sizes into dedicated method
	Make BlockSD constructor more clear by grouping all block size
	validations tohetger into dedicated method.

	Bug-Url: https://bugzilla.redhat.com/1928041

	storage: move activation of special LVs into dedicated method
	Block SD needs activate special LVs, like metadata, xlease etc., before
	it become active. Move activation of these LVs into dedicated method in
	SD manifest.

	Add also oposit method for deactivating special LVs.

	As the manifest shouldn't be access direcly, add also wrapper methods
	in BlockStorageDomain and call them via wrappers. Make these methods
	public as they need to be access from class methods.

	Bug-Url: https://bugzilla.redhat.com/1928041

	storage: remove metavol from BlockStorageDomain
	BlockStorageDomain doesn't use its `metavol` attribute anywhere. Remove
	is from the class.

	storage: remove imageGarbageCollector from blockSD constructor
	Image garbage collector is only for file SD and for block SD is not
	implemented. This call does nothing in blockSD constructor and just
	clutters the code. Remove this call from constructor.

2021-02-23  Nir Soffer  <nsoffer@redhat.com>

	tests: Simplify testing imagetickets
	The fake_connection fixture returns now the connection object, so tests
	can use it directly instead of accessing
	imagetickets.UnixHTTPConnection.

	With shorter error messages, this allows one line setup for many tests.

2021-02-22  Nir Soffer  <nsoffer@redhat.com>

	livemerge: Fix leak of completed cleanup threads
	We never remove cleanup threads from the _cleanup_threads dict. This
	leak was never noticed since we don't have lot of live merge operations,
	and cleanup threads are small.

	livemerge: Extract _lookup_drive() helper
	_update_jobs() is still too complicated and long. We need to change it
	to handle job state. Extract the code to find the drive to make this
	easier.

	livemerge: Separate job update and reporting
	Mixing job updates and reporting makes the code more complicated and
	hard to follow. This lead to bugs like not reporting jobs if the drive
	was not found, and makes it hard to avoid reporting untracked jobs.

	Extract the update part to _update_jobs(), which updates all jobs and
	does not return anything. query_jobs() calls it and return a dict of
	jobs info for engine.

	We could create separate public API, and make callers call update() and
	query() separately, but callers always want to get the job info dict and
	update in the same time, so keeping the public API as is.

	With this change we report jobs even if the drive was not found. We
	never expect this case, but not reporting the job may confuse engine.

2021-02-19  Roman Bednar  <rbednar@redhat.com>

	dump_volume_chains: remove workarounds
	The workaround in dump_volume_chains.py is no longer needed due to
	previous patch that changes the behavior of StorageDomain dump.

	The dump output no longer shows keys with None/null values or empty
	strings but instead leaves those keys out completely.

	Another possible value aside from None/null could be whitespace.
	Those are ruled out because lvm does not allow to have spaces, neither
	escaped sequence(space) in tags. If this is attempted lvchange
	command fails with: "Invalid argument for --addtag"

	Bug-Url: https://bugzilla.redhat.com/1870435

	blockSD: omit missing keys when dumping sd
	When block storage domain dumps information via
	.dump() public function it tries to get metadata
	from slots - directly from "metadata" lv and if
	image, parent or md information is missing it
	tries to use lv tags instead. When those are
	missing as well we used to return None which is
	not correct.

	The convention is to leave out the missing
	values/keys from the output completely when
	lookup fails instead of returning None/null.

	Bug-Url: https://bugzilla.redhat.com/1870435

2021-02-18  Nir Soffer  <nsoffer@redhat.com>

	livemerge: Remove unneeded comments
	Shared volume are read only and shared with multiple VMs, so it is
	pretty clear why we cannot modify them. The comment is not needed.

	The reason why we need to validate base volume size is documented in
	_validate_base_size() so we don't need to repeat that in merge().

	livemerge: Extract _create_job() helper from merge()
	Creating a job is pretty verbose since we have to construct a dict from
	the drive. It would be nice if engine was sending a useful job
	description but it does not, so we have to create it here.

	With this change merge() fits nicely in one laptop screen and is very
	easy to follow and modify.

	livemerge: Extract code for extending base volume
	Extract _base_needs_extend() and _start_extend() helpers from merge().

	With this change the issue of commit failing because extend did not
	complete is very clear. We start two asynchronous operations:

	    self._start_commit(drive, job)

	    if self._base_needs_extend(drive, base_info):
	        self._start_extend(drive, job, base_info, top_info)

	But the first operation depends on the second. This will be fixed in the
	next phase.

	livemerge: Extract _start_commit() helper
	Now that we reference only job and drive objects, it is very easy to
	extract the code to start a commit block job to a helper. In this patch
	it just makes the flow easier to follow, but in future patch we will
	start the commit only after extending the base volume was completed.

	livemerge: Rename baseInfo, topInfo
	We use these a lot in merge for validating, refreshing and extending
	base. Lets use more modern names to match the rest of the code.

	livemerge: Create job object early in merge
	Extracting helpers from merge() with the current code is hard, since we
	have too many variables that need to be passed to the helpers.

	Solve this issue by creating a job object early as possible and use this
	object instead of referring directly to base, top, bandwidth, and
	job_id.

	To make job more useful as a way to pass job details, the bandwidth was
	added to the job object. Like other attributes it is persisted now in
	the vm metadata.

	livemerge: Extract code for refreshing base volume
	Extract the code for refreshing the base volume from merge to make the
	code easier to maintain and understand.

	Based on Amit patch:
	https://gerrit.ovirt.org/c/vdsm/+/112828

	livemerge: Prepare for extracting merge helpers
	To extend the base volume before we start the merge, we need to extract
	the code for extending the base volume and starting the merge to
	separate helpers.

	Move the code related to starting the merge together, to make it easier
	to extract it in the next patches.

	livemerge: Modernize error handling
	Replace error handler like:

	   log.error("reason...")
	   return response.error("mergeErr")

	With raising the corresponding error with the relevant context:

	   raise exception.MergeFailed("reason...", job=job_id)

	This is converted to proper response in API.py.

	Remove the return value for normal case since it is also generated by
	API.py.

	This will make it easy to split the huge merge() method to smaller
	functions without passing around return values.

	The first example is _can_merge_info() converted to
	_validate_base_size() so we can raise the public exception inside it and
	simplify error handling in merge().

	livemerge: Remove unused attributes
	blockJobType was used only when reporting job info, so we can inline the
	constant value in the response. The strategy was never used so we can
	safely remove it.

	livemerge: Improve error handling when job exists
	If a job exists we logged a wrong message with the image id of the new
	job instead of the existing job, and we logged the issue twice - once
	when we found that the job exists, and second time when handling the
	error.

	Improve JobExistsError to keep the job and image ids, and raise it
	without any logging. Log the issue once with the correct error message.

	livemerge: Fix job info after libvirt failure
	If calling blockJobInfo() failed, we reported the last job live info
	instead of no live info. Fixed by clearing live info before adding the
	job info to the result.

	Also added the missing test covering this code.

	livemerge: Rename jobs metadata key
	In the vm metadata we still use "block_jobs". Rename to "jobs", and
	update the vm method name to be more consistent with other metadata sync
	methods.

	livemerge: Remove last traces of "block"
	Remove the last traces of "block" from methods, error messages and logs.

	Now block is used only when we refer to libvirt APIs (e.g. blockCommit)
	or block storage.

	livemerge: Clean up DriveMerger instance variables
	Remove unneeded words and convert to lower_case style.

	livemerge: Clean up checking if active commit is ready
	Now that job keeps the live info, we can simplify the check if active
	commit is ready.

	- We send the job to the function instead of live info.
	- Job tells now if it is ready for pivot, so we don't have to peek into
	  live info.
	- Job tells not if it is an active commit, so we don't depend on live
	  info for this.

	livemerge: Improve formatting
	Make _activeLayerCommitReady() more clear and easier to work with by
	replacing indentation with guard guard clauses, adding space after
	returns.

	livemerge: Keep job live info in Job
	Previously we got live info from libvirt in query_jobs() and while
	querying the jobs we built a dict for reporting job status to engine.
	This makes the code more complicated and hard to understand.

	We want to separate querying the job from reporting status. The first
	step is to keep the live info in the Job object, and provide a info()
	method returning the info expected by engine.

	livemerge: Make some Job attributes read only
	The basic job details cannot change so they should be read only to avoid
	accidental changes, and make the code more clear.

2021-02-18  Ales Musil  <amusil@redhat.com>

	net: Log desired state for sriov change

2021-02-18  Benny Zlotnik  <bzlotnik@redhat.com>

	spec: bump sanlock version
	Require sanlock 3.8.2-4 to make lvb bindings usable in order to support
	MBS copy/move.

	Bug-Url: https://bugzilla.redhat.com/1906074

	fakesanlock: fix empty get_lvb behavior
	fakesanlock's behavior should match sanlock when invoking get_lvb
	without invoking set_lvb first. Currently, it fails as we do not have
	the "lvb_data" field when trying to read.

	This patch changes the behavior to return null bytes with length of the
	requested size.

2021-02-17  Nir Soffer  <nsoffer@redhat.com>

	tests: Fix headers in FakeResponse
	Fix FakeResponse to add content-type header with default content type
	that works for most tests.

	Passing headers to FakeResponse is not very useful since we must
	replicate the code to create correct content-length. It is much easier
	to modify the good headers created by the fake response after.

	We had test for 204 "No content" response with and without
	content-length. Since new imageio never return content-length in this
	case, there is no point to continue to test the old behavior. Move the
	default behavior to the FakeResponse.

2021-02-17  Shani Leviim  <sleviim@redhat.com>

	vm.py: change log from info to warning
	When extending a RAW device, and in case the volumeInfo apparentsize
	is different from the requested size, this one should be logged as a
	warning instead of an info.

	vm.py: require newDiskSize also for LUN update
	The updateDiskSize flow has been changed from previous patch:
	https://gerrit.ovirt.org/c/vdsm/+/113039/

	- API changes:
	  * Making the newSize parameter required:
	    The getDeviceList operation (required for getting the updated LUN
	    size) will be performed on the engine, for each host with running
	    guest and plugged to the LUN disk.
	    The diskSizeExtend command is being called after the LUN size was
	    already sync among all hosts, and its new size is known.

	Command New Flow:
	1. VM with OS running on a host using direct LUN disk.
	2. User resizes the LUN on the storage server:
	   lvresize -L+1G /dev/vg1/lv1
	3. User trigger update of the LUN by pressing the new button of
	   'Update LUN size' under VM Disks sub-tab
	4. SyncDirectLunCommand is performed on the engine side and updates the
	   LUN on each relevant host using Host.getDeviceList()
	5. engine send VM.diskExtendSize command with the LUN details
	6. vdsm update libvirt about the new LUN size

	Bug-Url: https://bugzilla.redhat.com/1155275

	API.py: Add a 'refresh' flag to getDeviceList()
	The 'refresh' flag indicates rather a storage refresh is required
	during getDeviceList(). Default value is True.
	By that, we can get the same info about the LUN, without slow
	rescans and dropping any caches in vdsm.

2021-02-17  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.50.6

2021-02-17  Nir Soffer  <nsoffer@redhat.com>

	livemerge: More clear exception name
	BlockJobActiveError was named after the libvirt error code
	VIR_ERR_BLOCK_COPY_ACTIVE, but it does not describe the issue properly.
	Rename to JobIsNotReadyError to make it more clear.

	livemerge: Don't repeat module name in class name
	Now that LiveMergeCleanupThread is in the livemerge module we can call
	it CleanupThread.

	livemerge: Rename job_id() to find_job_id()
	The new name should make the purpose of this method more clear and it is
	more consistent with similar methods in vm.py.

	livemerge: Replace BlockJob with Job
	The livemerge module is not a generic block job tracking module, so we
	should not use the libvrt term "block job". Remove "block" from
	exceptions names and methods.

	livemerge: Remove unneeded temporary variables
	Now that we can use self.job.top and self.job.base there is not need to
	create temporary variables. Fix also the indentation that was probably
	the reason for introducing the variables.

	livemerge: Unify base and top volume naming
	We used "baseVolUUID" and "topVolUUID" in the API level, and stored
	"baseVolume" and "topVolume" in the job metadata.

	Unify the terms and use "base" and "top" everywhere. These are the terms
	used by libvirt for blockCommit, and using the same terms and shorter
	names makes the code nicer to work with. These are also the terms used
	in most of the logs.

	livemerge: Unify the terms for job
	We had "jobUUID" in the API level, "jobID" in DriveMerger, and jobID in
	the Job class. Repeating the class name in the instance variable does
	not make sense and make it harder to work with the code.

	- Change Job.jobID to Job.id
	- Change "jobID" key in dumped job to "id"
	- When we pass around job id, call it job_id
	- When we have a job instance, don't keep separate job_id variable, use
	  job.id.
	- Rename jobUUID to job_id

	livemerge: Replace storedJob with job
	We don't have the concept of stored and non-stored job, and it make it
	harder to maintain and understand the code when we use different terms
	for the same thing in the same code. Now we use the term "job"
	everywhere in livemerge module.

	livemerge: Replace job dict with a Job class
	To minimize the change and make it safer and easier to review I kept the
	dict key name as class attribute, and made all class attributes
	writable.

	One advantage is removing some constant keys. They are now read only
	properties, so we don't need to pass them around or serialize them.

	To implement serialization for vm metadata, the class supports to_dict()
	and from_dict methods.

	This change makes the code little bit nicer, replacing dict access
	(job["baseVolume"]) with attribute access (job.baseVolume).

	Live info is not part of the class yet, and it also does not know how to
	report job status. This will be added in followup patches.

	The change is based on a much bigger patch from Amit:
	https://gerrit.ovirt.org/c/vdsm/+/112319

2021-02-16  Benny Zlotnik  <bzlotnik@redhat.com>

	clusterlock: add lvb flag to sanlock#acquire
	Patchset[1] to sanlock introduced the lvb flag for sanlock#acquire
	allowing the usage of get_lvb and set_lvb

	[1] https://pagure.io/sanlock/c/4e36aad261de84c44318f4e14549cacb2578d913

2021-02-16  Ales Musil  <amusil@redhat.com>

	automation: Run container for vdsm tests
	Run all tests that are run on travis on PSI
	jenkins in container.

2021-02-16  Roman Bednar  <rbednar@redhat.com>

	tests: remove duplicit imagetickets tests
	Some tests became duplicates after adding tests for imageio error
	response parsing.

	Removing those tests does not change coverage.

	Also removing a useless parameter that has no effect.

	Bug-Url: https://bugzilla.redhat.com/1858956

	imagetickets: fix imageio text response parsing
	Since 4.3 imageio returns text errors instead of json and was causing
	imagetickets._read_content() to raise ImageDaemonError exception.

	Parsing of response should be moved from _read_content() and done in
	request() where we parse response based on content-type (and charset).

	For raising errors message is decoded based on charset otherwise
	returns bytes to caller.

	get_ticket() is the only place where json has to be parsed so it's not
	part of _request() anymore

	Bug-Url: https://bugzilla.redhat.com/1858956

	imagetickets: refactor request() to private function
	Parsing of response content has been moved to caller
	functions and request() should be marked private.

	This requires changes to tests as well and removing
	direct testing of request() and replacing with
	public functions.

	By testing only the public functions, we never have
	to change the tests when changing the implementation.

	Bug-Url: https://bugzilla.redhat.com/1858956

2021-02-16  Tomáš Golembiovský  <tgolembi@redhat.com>

	virt: v2v: do not use direct backend for most import methods
	The only problematic input is still Xen. We have to rely on direct
	backend until all the issues are solved.

	Instances of guestfs are started in session libvirt and not in system
	libvirt. On the bright side those VMs cannot confuse VDSM, but it may
	be unexpected surprise for those debuging issues with import.

	Bug-Url: https://bugzilla.redhat.com/1370055

2021-02-15  Ales Musil  <amusil@redhat.com>

	net, tests: Wait for IP confirmation before dnsmasq run
	Sometimes when dnsmasq is started, the IP of the "server"
	interface is not present. That leads to dnsmasq confusion
	as it refuses to response to DHCP requests. Wait for the
	confirmation IP confirmation from netlink before starting
	dnsmasq.

	net, tests: Reuse waitfor module for wait_for_ipv6 in tests
	Instead of using event monitor use waitfor module which
	contains wait_for_ipv6 already.

2021-02-15  Marcin Sobczyk  <msobczyk@redhat.com>

	stdci: Fix specification of d/s agents
	After introducing el8 runs to d/s CI, a change in jenkins
	was needed and to make all d/s pipelines have an implicit
	'host-distro: same' rule. We should not specify 'host-distro'
	for these ourselves anymore.

2021-02-11  Benny Zlotnik  <bzlotnik@redhat.com>

	fakesanlock: add lvb support
	To match sanlock behavior[1]: support the lvb flag in fakesanlock#acquire,
	add set_lvb and get_lvb

	[1] https://pagure.io/sanlock/c/4e36aad261de84c44318f4e14549cacb2578d913

2021-02-11  Roman Bednar  <rbednar@redhat.com>

	tests: cover text error responses from imageio
	Tests should cover that imagetickets can process
	text error responses from imageio.

	Tests are marked xfail until fixed.

	Bug-Url: https://bugzilla.redhat.com/1858956

2021-02-10  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.50.5

2021-02-10  Tomáš Golembiovský  <tgolembi@redhat.com>

	virt: sysprep: run selinux relabeling for new VMs
	When some files are missing in the guest that are later created by
	virt-sysprep, the new files don't necessarily have correct SELinux
	labels. We can use virt-sysprep to notify the selinux subsystem about it
	and enforce relabeling when new VM is started.

	Bug-Url: https://bugzilla.redhat.com/1860492

2021-02-09  Eyal Shenitzky  <eshenitz@redhat.com>

	backup.py: allow using scratch disks on shared storage
	Until now, when a backup created for a VM a scratch disk was
	created for each disk that participates in the backup.
	The scratch disks were created on the local storage of the host
	that runs the VM.

	The host local storage is limited and if the storage runs out during
	the backup due to intensive data written to the VM, the VM will
	be paused.

	To avoid this situation, the scratch disks can be created on the
	same shared storage where the backed-up disk resides. To allow that,
	the engine needs to create the scratch disk and prepare it before
	starting the backup, and teardown the disk and remove it when the backup
	ended.

	A New 'scratch_disk' attribute was added for each disk configuration
	parameter in the backup, if will contain all the needed info on the scratch
	disks so it can be used in the backup (disk type, path to the prepared disk).

	For example, to start a backup for a VM with scratch disks that already
	were created by the engine the following request structure should be used:

	{
	    backup_id: "backup-1",
	    disks: [
	        {
	            "img_id": "disk1",,
	            "domain_id": "domain1",
	            "volume_id": "volume1",
	            "checkpoint": false,
	            "backup_mode": "full",
	            "scratch_disk": {
	                "path": "/path/to/disk1/scratch_disk"
	                "type": "file",
	             }
	        },
	        {
	            "img_id": "disk2",,
	            "domain_id": "domain2",
	            "volume_id": "volume2",
	            "checkpoint": true,
	            "backup_mode": "incremental",
	            "scratch_disk": {
	                "path": "/path/to/disk2/scratch_disk"
	                "type": "block",
	            }
	        },
	    from_checkpoint_id: "from_checkpoint_id",
	    to_checkpoint_id: "to_checkpoint_id"
	}

	Bug-Url: https://bugzilla.redhat.com/1874483

2021-02-09  Tomáš Golembiovský  <tgolembi@redhat.com>

	supervdsm: enable compression for NVRAM data
	Compressing the data with bzip2 makes the final (base64 encoded) string
	around 7x smaller.

	Instead of enabling the compression only for NVRAM we change the
	default to always have compression. In case of TPM we explicitly disable
	the compression for backward-compatibility. Ideally we would like to
	enable compression also for TPM in the future.

	virt: add support for compression of external data

	virt: filedata: do not use legacy interface to base64
	We don't have any reason to be lenient and silently accept invalid data.
	The legacy interface however does not provide data validation on input.

2021-02-08  Vojtech Juranek  <vjuranek@redhat.com>

	api: remove obsolete LegacyCdromPath type
	`LegacyCdromPath` is deprecated for a very long time and current (as
	well a past) implementation would fail if this type is used. Remove it.

	caps: add capability for specifying CD via PDIV
	In previous patches was added support for specifying CD via PDIV, which
	is the preferred way who to specify the image. To let engine know that
	host is able to work with images specified by PDIV, add new capability
	`cd_change_pdiv` simple set to True if host support this functionality.

	Bug-Url: https://bugzilla.redhat.com/1589763

	api: add DriveSpecVolumeCdrom CD volume specification
	To be able to use block based images for CD, we should specify them
	using PDIV. Add `DriveSpecVolumeCdrom`, which allows to specify CD using
	PDIV. To keep backward compatibility, keep old spec using path, but
	rename it to `DriveSpecPathCdrom` and for specifying CD convert
	`DriveSpecCdrom` into union type, which unifies the old type and new
	one.

	Bug-Url: https://bugzilla.redhat.com/1589763

2021-02-04  Eyal Shenitzky  <eshenitz@redhat.com>

	add_bitmap.py: perform operation validation after chain prepared
	In order to validate that the bitmap doesn't exit in the volume
	chain there is a use with 'qemu-img info' command.
	For block-based volumes, the command will fail because the
	validation is done before the preparation of the chain.

	This is now fixed by moving the validation to occur only after
	the chain prepared.

2021-02-04  Nir Soffer  <nsoffer@redhat.com>

	automation: Log installed packages versions
	We are fighting with CI and repos trying get decent qemu-kvm and
	qemu-img so we don't xfail and skip many import tests, but it is very
	hard to understand what's going on since we don't have enough visibility
	on the slave packages.

	Log installed qemu-kvm and qemu-img version on the start of the build to
	make this easy.

2021-02-04  Roman Bednar  <rbednar@redhat.com>

	tests: convert vdsmdumpchains_test to pytest
	Converting all tests in a module to pytest.

	Namespace for exception values changed in
	pytest.raises context manager to "value",
	which used to be "exception".

	Methods have been converted to functions
	as the class is not needed anymore.

	Converted to py3 which required only
	removal of __future__ imports.

	Checked with 2to3 utility and "python3.6
	-m compileall -q vdsmdumpchains_test.py"

	Bug-Url: https://bugzilla.redhat.com/1870435

2021-02-03  Marcin Sobczyk  <msobczyk@redhat.com>

	New release: 4.40.50.4

2021-02-03  Vojtech Juranek  <vjuranek@redhat.com>

	tests: remove obsolete CD test
	Test test_change_cd_pdiv() was added to have at least some test before
	start with CD related changes. In previous patches better tests which
	cover this scenario were added. Moreover, this test has an issue that
	it passed PDIV into 'path' parameter. This works and were done
	intentionally to have a way how to test image activation. But now can
	be removed.

	vm: use new _change_cd() when engine sends CD spec as PDIV
	Switch to new private Vm._change_cd() method in public Vm.changeCD()
	when engine specifies CD using PDIV. For backward compatibility, when
	engine hasn't been upgraded yet and specifis CD by sending it's path,
	use the old method.

	Update tests to use public Vm.changeCD() method instead of private one.

	Bug-Url: https://bugzilla.redhat.com/1589763

	vm: always return empty list from changeCD
	This list is required by API and engine expects it. We always return
	empty list. As we added new function for CD change, do avoid duplicating
	the code and make it more clear, return this empty list from calling
	function instead of from method which does actual change.

	Bug-Url: https://bugzilla.redhat.com/1589763

	vm: add new private change_cd method
	Add new implementation of change CD functionality. The method is private
	as will be called only for corresponding cluster version and when CD
	volume is specified using PDIV. Otherwise old implementation will be
	called to keep backward compatibility. This is also the reason to keep
	old implementation.

	Main difference is that the new method adds CD metadata about loaded
	CD. In case of failure we can use this metadata to recover CD and also
	properly tear down the unused volumes.

	Bug-Url: https://bugzilla.redhat.com/1589763

	vm: add helper methods for updating CDROM metadata
	To support work with CD on block storage domains, add new helper methods
	which add or remove metadata `change` element with information about
	volume begin attached as CDROM to VM.

	Before we start with inserting CD, which on block SD means we have to
	activate the volume, add `change` element into CDROM metadata. In case
	vdsm fails before finishing this operation, we know that the volume
	should be deactivated during restore. If everything goes well, the
	`change` element is removed from metadata at the end of the CD insert
	operation.

	Whole process of CD change looks like this:

	1. Add PDIV for change CD into CDROM metadata.
	2. Prepare volume to be loaded.
	3. Change the CD using libvirt call.
	4. Tear down ejected volume.
	5. Update CDROM metadata and remove `change` element from the metadata.

	Helper methods added in this patch cover steps 1. and 5.

	Bug-Url: https://bugzilla.redhat.com/1589763

	virt: add `change` key into metadata
	Add change key into virt metadata. This element will be used during
	change of CD in CDROM and its purpose is to store information about
	volume being loaded into CD. This information can be used in recovery
	process to (de)activate volumes on block storage. E.g. if vdsm fails
	during change CD on block storage once we activate new CD volume but
	haven't deactivated old CD volume yet, we end up with two active
	volumes and have to deactivate one of them. So far we keep information
	only about loaded CD. With this keep information about old as well as
	new CD and act accordingly during recovery process. Once new CD is
	successfully changed by libvirt and old CD volume is deactivated,
	metadata will be updated with information about current CD and `change`
	element will be removed.

	Current implementation
	======================

	Currently, change CD is exposed as part of vdsm API and accepts two
	parameters from the called - vm ID, begin UUID of the VM and driveSpec,
	begin specification of the new CD. driveSpec is a dictionary with
	following items: `iface`, `index` and `path`. `iface` and `index`
	specifies CDROM device, `path` specifies location (path) of new CD on
	the disk as seen by host. Empty path (i.e. `path=""`) means that CD
	should be ejected. Example of the the driveSpec for changing CD is as
	follows:

	    {
	        'iface': 'sata',
	        'path': '/rhev/data-center/mnt/blockSD/5f2f58dd-4c46-4f38-88d3-d6a78973c0e2/images/2eb0aa33-f5ff-4fbf-a149-da17c4d191d9/28c61447-bcd4-48e8-b6d4-2613d4a50bff',
	        'index': '2'
	    }

	Example of the the driveSpec when CD is ejected is as follows:

	    {
	        'iface': 'sata',
	        'path': '',
	        'index': '2'
	    }

	When the CD is loaded, metadata with PDIV information and volume chain
	is stored into VM metadata. Example of metadata:

	    <ovirt-vm:device devtype="disk" name="sdc">
	        <ovirt-vm:domainID>88252cf6-381e-48f0-8795-a294a32c7149</ovirt-vm:domainID>
	        <ovirt-vm:imageID>89f05c7d-b961-4935-993f-514499024515</ovirt-vm:imageID>
	        <ovirt-vm:poolID>13345997-b94f-42dd-b8ef-a1392f65cebf</ovirt-vm:poolID>
	        <ovirt-vm:volumeID>626a493f-5214-4337-b580-96a1ce702c2a</ovirt-vm:volumeID>
	        <ovirt-vm:volumeChain>
	            <ovirt-vm:volumeChainNode>
	                <ovirt-vm:domainID>88252cf6-381e-48f0-8795-a294a32c7149</ovirt-vm:domainID>
	                <ovirt-vm:imageID>89f05c7d-b961-4935-993f-514499024515</ovirt-vm:imageID>
	                <ovirt-vm:leaseOffset type="int">105906176</ovirt-vm:leaseOffset>
	                <ovirt-vm:leasePath>/dev/88252cf6-381e-48f0-8795-a294a32c7149/leases</ovirt-vm:leasePath>
	                <ovirt-vm:path>/rhev/data-center/mnt/blockSD/88252cf6-381e-48f0-8795-a294a32c7149/images/89f05c7d-b961-4935-993f-514499024515/626a493f-5214-4337-b580-96a1ce702c2a</ovirt-vm:path>
	                <ovirt-vm:volumeID>626a493f-5214-4337-b580-96a1ce702c2a</ovirt-vm:volumeID>
	            </ovirt-vm:volumeChainNode>
	        </ovirt-vm:volumeChain>
	    </ovirt-vm:device>

	Proposed changes
	================

	To be able to work with images on block SD, we need PDIV of the images.
	Therefore it would be useful to request this information directly from
	the caller instead of `path`. The API should still require a dict with
	`iface` and`index`, but caller should provide a dict, `pdiv` with PDIV
	information. To keep backward compatibility, `path` should still be a
	valid option. The `DriveSpecCdrom` needs to be extended with attribute
	`drive` of a type `DriveSpecVolume` with defautl value `null` to keep
	backward compatibility.

	As `path` is kept only for backward compatibility and won't used by new
	implementation, for new calls it will be removed from RPC call dict.

	Request to eject CD will be done by not providing PDIV information in
	`drive` dict., i.e. by sending `null` for the `drive`.

	Example of newly proposed structure of dict is as follows:

	    {
	        'iface': 'sata',
	        'index': '2',
	        'drive': {
	            'device': 'cdrom',
	            'poolID': 13345997-b94f-42dd-b8ef-a1392f65cebf'',
	            'domainID': '88252cf6-381e-48f0-8795-a294a32c7149',
	            'imageID': '89f05c7d-b961-4935-993f-514499024515',
	            'volumeID': '626a493f-5214-4337-b580-96a1ce702c2a'
	        }
	    }

	Example of the dict when eject CD is requested is as follows:

	    {
	        'iface': 'sata',
	        'index': '2',
	        'drive': null
	    }

	Information from the dict will be stored in <change> element of the
	metadata and after successful change of CD, the <change> element will
	be removed and new PDIV will be stored in CD metadata instead of old
	PDIV.

	As current implementation of metadata doesn't allow to store empty
	elements, we need to store additional element <state> to know what it
	currently running operation. Without this, when ejecting CD, <change>
	element would be empty and where not stored, and thus in case of failure
	we would loose the information that CD eject change was in progress.

	As we actually don't need volumeChain element for anything, this
	element can be removed. Final CD metadata will look like this when
	CD change is in the progress:

	    <ovirt-vm:device devtype="cdrom" name="sdc">
	        <ovirt-vm:domainID>88252cf6-381e-48f0-8795-a294a32c7149</ovirt-vm:domainID>
	        <ovirt-vm:imageID>89f05c7d-b961-4935-993f-514499024515</ovirt-vm:imageID>
	        <ovirt-vm:poolID>13345997-b94f-42dd-b8ef-a1392f65cebf</ovirt-vm:poolID>
	        <ovirt-vm:volumeID>626a493f-5214-4337-b580-96a1ce702c2a</ovirt-vm:volumeID>
	        <ovirt-vm:change>
	                <ovirt-vm:state>loading</ovirt-vm:state>
	                <ovirt-vm:domainID>09a0c152-7b9a-44ae-866c-c6486d3e10c9</ovirt-vm:domainID>
	                <ovirt-vm:imageID>1e1ff19c-01d0-432c-97c2-5883fc833ce4</ovirt-vm:imageID>
	                <ovirt-vm:poolID>fb7797bd-3cca-4baa-bcb3-35e8851015f7</ovirt-vm:poolID>
	                <ovirt-vm:volumeID>25b29335-8759-4464-a35f-023bcfd922c8</ovirt-vm:volumeID>
	        </ovirt-vm:change>
	    </ovirt-vm:device>

	If the change succeeds, the metadata will be:

	    <ovirt-vm:device devtype="cdrom" name="sdc">
	        <ovirt-vm:domainID>09a0c152-7b9a-44ae-866c-c6486d3e10c9</ovirt-vm:domainID>
	        <ovirt-vm:imageID>1e1ff19c-01d0-432c-97c2-5883fc833ce4</ovirt-vm:imageID>
	        <ovirt-vm:poolID>fb7797bd-3cca-4baa-bcb3-35e8851015f7</ovirt-vm:poolID>
	        <ovirt-vm:volumeID>25b29335-8759-4464-a35f-023bcfd922c8</ovirt-vm:volumeID>
	    </ovirt-vm:device>

	When ejecting CD is in progress, the metadata will be:

	    <ovirt-vm:device devtype="cdrom" name="sdc">
	        <ovirt-vm:domainID>88252cf6-381e-48f0-8795-a294a32c7149</ovirt-vm:domainID>
	        <ovirt-vm:imageID>89f05c7d-b961-4935-993f-514499024515</ovirt-vm:imageID>
	        <ovirt-vm:poolID>13345997-b94f-42dd-b8ef-a1392f65cebf</ovirt-vm:poolID>
	        <ovirt-vm:volumeID>626a493f-5214-4337-b580-96a1ce702c2a</ovirt-vm:volumeID>
	        <ovirt-vm:change>
	                <ovirt-vm:state>ejecting</ovirt-vm:state>
	        </ovirt-vm:change>
	    </ovirt-vm:device>

	If ejecting of the CD succeeds, there won't be any CD related metadata,
	as we haven't anything to store there.

	Bug-Url: https://bugzilla.redhat.com/1589763

2021-02-03  Milan Zamazal  <mzamazal@redhat.com>

	virt: Remove extra space after lambda in Vm.__init__

2021-02-02  Nir Soffer  <nsoffer@redhat.com>

	livemerge: Make the return value more clear
	Rename meaningless (and ugly) jobsRet to tracked_jobs. After spending
	some time reading this code, I'm pretty sure that I understand how it
	works, and this makes this code more clear.

	In the unlikely case when we don't find the job drive, we don't report
	the job info. This looks like a bug that we need to think about later.

	livemerge: More clear dict formatting
	We have 2 very similar and confusing dicts in the module:
	- job: kept in self._blockJobs, use to manage the job state.
	- entry: dict with job info, returned from queryBlockJobs, used to
	  report job status to engine.

	Format both dicts using one key: value per line, sorted by key.

	To make the purpose more clear, rename the meaningless "entry" dict to
	"job_info".

	livemerge: Make queryBlockJob more readable
	Separate blocks of code to make the flow easier to follow, and minimize
	the next patches in this complex code.

	tests: Rename test module after the actual module
	Using consistent names <modulename>_test.py makes it easier to work with
	the code.

	livemerge: Move startCleanup inline function to class
	Inline functions make the code more complicated and fragile, and in this
	case queryBlockJobs is too big and complex and moving the function out
	of it will make it easier to refactor.

	virt: Fix circular imports in livemerge, snapshot
	When snapshot and livemerge modules were created, the code was moved
	from vm module, creating circular dependencies. Because the vm module is
	imported early during runtime, the dependency was hidden. But when
	trying to run the new merge_test.py manually, collecting the tests fail
	with:

	______________ ERROR collecting tests/virt/merge_test.py _______________
	ImportError while importing test module 'tests/virt/merge_test.py'.
	Hint: make sure your test modules/packages have valid Python names.
	Traceback:
	/usr/lib64/python3.8/importlib/__init__.py:127: in import_module
	    return _bootstrap._gcd_import(name[level:], package, level)
	virt/merge_test.py:31: in <module>
	    from vdsm.virt.livemerge import (
	../lib/vdsm/virt/livemerge.py:33: in <module>
	    from vdsm.virt import vm
	../lib/vdsm/virt/vm.py:66: in <module>
	    from vdsm.virt.livemerge import DriveMerger
	E   ImportError: cannot import name 'DriveMerger' from partially initialized module
	'vdsm.virt.livemerge' (most likely due to a circular import) (lib/vdsm/virt/livemerge.py)

	The issue in was StorageUnavailableError, defined in the vm module. Fix
	by introducing vdsm.virt.errors modules, a place for internal errors
	shared by the virt package.

	The new module contains now only the StorageUnavailableError, to solve
	the circular dependency. We can move all other internal errors there to
	clean up the mess.

2021-02-02  Tomáš Golembiovský  <tgolembi@redhat.com>

	virt: add support for writing NVRAM data
	Bug-Url: https://bugzilla.redhat.com/1669178

	virt: add support for reading NVRAM data
	Bug-Url: https://bugzilla.redhat.com/1669178

	virt: supervdsm: add support for reading and writing NVRAM data
	Add functionality to read and write NVRAM data. Files are stored in
	libvirt directory. Normally the directory is created by libvirt. In rare
	cases the directory may not exist yet and we have to create it. This is
	probably needed only when the first UEFI VM started on the host is the
	one with Secure boot enabled and stored NVRAM.

	Bug-Url: https://bugzilla.redhat.com/1669178

	virt: periodic: add base class for external data
	Create new base class for external data to make adding other data kinds
	easier.

	Bug-Url: https://bugzilla.redhat.com/1669178

	virt: vm: make method for initializing external data generic
	Bug-Url: https://bugzilla.redhat.com/1669178

	virt: vm: turn update_tpm() into generic function
	After previous refactoring update_tpm() is now a stub and does not need
	to be purely for TPM. By adding a kind argument it can be easily turned
	into generic function for updating various external data.

	Bug-Url: https://bugzilla.redhat.com/1669178

	virt: vm: wrap external data into a class
	The methods for handling TPM can be reused for other external data
	(NVRAM coming soon). To make adding new external data easier this patch
	wraps everything into an object and collects the external data in a
	dictionary.

	Rather than trying to handle all data kinds together we prefer to focus
	on one kind at a time. That way the flows can be kept separate and in
	case returning some data fails or blocks for some reason it won't
	interfere with retrieval of other kinds.

	Bug-Url: https://bugzilla.redhat.com/1669178

2021-02-02  Eyal Shenitzky  <eshenitz@redhat.com>

	caps.py: support offline VM backup
	Bug-Url: https://bugzilla.redhat.com/1891470

2021-02-01  Nir Soffer  <nsoffer@redhat.com>

	nbd: Support exporting bitmap from backing chain
	When exporting a bitmap from backing chain, we cannot use qemu-nbd
	--bitmap option, since it exports only the bitmap from the top image.

	To export bitmap from the entire chain, we create an overlay on top of
	the actual volume, add empty bitmap to the overlay, and merge the bitmap
	from all the backing chain nodes into the overlay bitmap. Finally we
	export the overlay instead of the actual volume.

	To allow this mode, path verification was changed to allow a transient
	disk with backing file in the storage repository.

	The operation may fail with new storage bitmap exceptions if the bitmap
	is invalid or missing in some of the nodes, or does not exist in any
	node. Engine should delete the bitmap from stored checkpoints when
	handling these errors.

	We already have bitmap exceptions in the generic vdsm exceptions module,
	but they cannot be used in storage code, so I added new bitmap
	exceptions in the storage exceptions module.

2021-02-01  Shani Leviim  <sleviim@redhat.com>

	vm.py: avoid hiding errors in case of exceptions

	virt: expend diskSizeExtend for updating LUN disks
	This patch extends the diskSizeExtend command so it can update the
	size of a specified LUN disk attached to a VM while the VM is running:

	- API changes:
	  * Making the newSize parameter optional:
	    This value is needed only for DriveSpecVolume.
	  * Changing driveSpecs to *DriveSpec type:
	    The DriveSpecVolume can be used only to pass PDIV drives.
	    To support LUNs, the engine must pass a DriveSpecGUID.
	    Therefore, to support multiple types of drives,
	    the type was changed to *DriveSpec.

	- LUNs size may not be available on the host after resizing on the
	  server:
	  By calling getDeviceList(guids=(drive.GUID,),checkStatus=False),
	  we refresh the LUN size on the host running the VM and get the
	  updated size.

	Command Flow:
	1. VM with OS running on a host using direct LUN disk.
	2. User resizes the LUN on the storage server:
	   lvresize -L+1G /dev/vg1/lv1
	3. User trigger update of the LUN by pressing the new button of
	   'Update LUN size' under VM Disks sub-tab
	4. engine send VM.diskExtendSize command with the LUN details
	5. vdsm refresh the LUN size using Host.getDeviceList()
	6. vdsm update libvirt about the new LUN size

	Verification flow (example):
	- Verify size on the host:
	root@apple-tlv-redhat-com ~ # multipath -ll /dev/mapper/360014053f6692b7b2b34570855cbc982
	360014053f6692b7b2b34570855cbc982 dm-2 LIO-ORG,block_backend_l
	size=7.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
	`-+- policy='service-time 0' prio=50 status=active
	  `- 8:0:0:0  sda 8:0   active ready running

	- Verify size on the VM:
	$ lsblk /dev/sda
	NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
	sda    8:0    0   7G  0 disk

	- Resize LUN on the server:
	root@galactica ~ # lvresize -L+1G /dev/vg1/lv1
	  Size of logical volume vg1/lv1 changed from 7.00 GiB (1792 extents) to 8.00 GiB (2048 extents).
	  Logical volume vg1/lv1 successfully resized.

	- Execute the diskSizeExtend command on the host:
	root@apple-tlv-redhat-com ~ # cat args.json
	{
	  "vmID": "23b7bae6-a7eb-4ee1-84a9-524c408be245",
	  "driveSpecs":
	  {
	    "GUID": "360014053f6692b7b2b34570855cbc982"
	  }
	}

	root@apple-tlv-redhat-com ~ # vdsm-client -f args.json VM diskSizeExtend
	"8589934592"

	- Verify the new size on the host:
	360014053f6692b7b2b34570855cbc982 dm-2 LIO-ORG,block_backend_l
	size=8.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
	`-+- policy='service-time 0' prio=50 status=active
	  `- 8:0:0:0  sda 8:0   active ready running

	- Verify the new size on the VM:
	$ lsblk /dev/sda
	NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
	sda    8:0    0   8G  0 disk

	Bug-Url: https://bugzilla.redhat.com/1155275

2021-02-01  Nir Soffer  <nsoffer@redhat.com>

	requirements: Use latest tox
	Similar to pytest and yappi, lets use latest tox to enjoy latest
	features and fixes earlier.

	requirements: Require latest yappi
	Recently we remove yappi version from tox.ini, but left the version in
	docker/requirements.txt. This file is used when building the containers
	and for creating local development environment.

2021-01-31  Shani Leviim  <sleviim@redhat.com>

	README: remove reference for updating fc30
	Since the automation/check-patch.packages.fc30 file no longer exists,
	this mention should be removed.

2021-01-28  Nir Soffer  <nsoffer@redhat.com>

	nbd: Validate export bitmap for writable volume
	Exporting bitmap is relevant only for incremental backup. Fail
	NBD.start_server() with UnsupportedOperation if server is started with
	bitmap and readonly=False.

	nbd: Validate export bitmap with raw volume
	Exporting a bitmap from raw volume is not possible. Fail
	NBD.start_server() with UnsupportedOperation if called for raw volume.

	nbd: Validate exporting bitmap and backing_chain
	When exporting a bitmap we must expose the entire backing chain. Fail
	NBD.start_server() with UnsupportedOperation if both bitmap and
	backing_chain=False are specified.

	supervdsm: Allow overriding storage transient_disks
	When testing nbd server with bitmaps, we need to create and export a
	transient disk on top of the original disk. Since the tests cannot
	monkeypatch supervdsm process, we need to run supervdsm with custom
	transient disk directory, in the same way we override the data-center
	directory.

2021-01-28  Liran Rotenberg  <lrotenbe@redhat.com>

	sampling: consider claimable memory
	A while ago we added the SReclaimable memory as part of free memory in
	commit 31272c3178. But we didn't consider it in the memory usage. This
	made a discrepancies to the host memory.

	Bug-Url: https://bugzilla.redhat.com/1916519

2021-01-28  Nir Soffer  <nsoffer@redhat.com>

	storage.exceptions: Fix bad message
	We are using "msg" attribute instead of deprecated "message" since:

	commit 51490e0c2517234cb829cb63484286e1005950ec

	    py3/storage: Use `msg' instead of `message' in exceptions

	But when adding the transientdisk module in:

	commit 787556f94f179454d56e271daee257e4553c7286

	    storage: introduce transientdisk module

	New exception was added using "message". This show the "msg" attribute
	inherited from StorageException instead of the class message.

2021-01-27  Vojtech Juranek  <vjuranek@redhat.com>

	vm: refactor xml creation from Vm._changeBlockDev()
	Move preparation of disk device XML from VM._changeBlockDev() to
	dedicated method.

	VM XML is already tested in test_change_loaded_cd() and thus no new test
	is added in this patch.

	Bug-Url: https://bugzilla.redhat.com/1589763

	vm: refactor Vm._changeBlockDev()
	Break Vm._changeBlockDev() into smaller independent methods which can be
	reused for implementation of other methods like insert_cd() method.
	Start with moving libvirt calls into dedicated method.

	Bug-Url: https://bugzilla.redhat.com/1589763

	tests: add test change loaded CD
	Before refactoring VM._changeBlockDev(), add test for changing loaded
	CD. Tests for ejecting CD and CD change failure when path to CD is not
	valid are already present.

	Bug-Url: https://bugzilla.redhat.com/1589763

	tests: add fixture for VM with CD loaded
	To make testing of chaning CD, provide a fixture with will return VM
	with CD already loaded into the CDROM drive.

	tests: use also image ID for storing prepared volumes
	Current implementation of fake prepared volumes would
	fail for teardownImage() as it doesn't pass vol_id into the call,
	resulting into calling fake teardownImage() with vold_id=None:

	    File "/home/vjuranek/ovirt/vdsm/tests/virt/vmfakelib.py", line 99, in teardownImage
	        del self.prepared_volumes[(sdUUID, volUUID)]
	    KeyError: ('d3adaee4-34d6-4691-a51d-48dd9096091f', None)

	Add image ID into the prepared volumes key, so that the key is now
	(sd_id, img_id, vol_id).

	In rela code volume ID is never used and all volumes belonging to the
	image are deactiveted, there also in the fake implementation all the
	volumes for specified image are removed.

	tests: add tests for utils.isVdsmImage()

2021-01-27  Nir Soffer  <nsoffer@redhat.com>

	tests: Support creating volume with parent
	Add parent option to nbd_test.create_volume(), to allow testing volumes
	with backing chains.

2021-01-27  Eyal Shenitzky  <eshenitz@redhat.com>

	vdsm-api.yml: change DiskType name to DiskContentType
	DiskType actually reflect the type of content in
	the disk (data, OVF, etc..) while the disk type is needed to
	identify if the disk is block-based, file-based, network, etc.

	This patch changes only the name of the object in the schema and
	doesn't break the existing API.

	spec.in: bump libvirt version to 6.6.0-13
	Libvirt 6.6.0-13 version supports the following features that needed
	for incremental backup:

	  1. Support checkpoint redefinition without the VM domain XML.

	  2. Verify the checkpoint when it is redefined using the new
	     VIR_DOMAIN_CHECKPOINT_CREATE_REDEFINE_VALIDATE flag.

	  3. Distinc between errors when a backup is started by providing
	     the new VIR_ERR_CHECKPOINT_INCONSISTENT error.

	Bug-Url: https://bugzilla.redhat.com/1896245

2021-01-27  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.50.3

2021-01-27  Ales Musil  <amusil@redhat.com>

	net: Fix presence of DHCP and Autoconf in nmstate
	If IP address was configured by other means than
	NM or nmstate, there is no way for nmstate to tell
	if the IP was obtained by dhcp or autoconf.

	In order to fix the missing info from nmstate
	state, default to False for both cases.

2021-01-26  Marcin Sobczyk  <msobczyk@redhat.com>

	ci: Fix rpmbuild path for PSI runs
	The 'rpmbuild' directory path is different when using mock
	vs when using PSI VMs. 'rpm --eval %_topdir' will always
	return the right path.

2021-01-26  Nir Soffer  <nsoffer@redhat.com>

	api: Remove leftovers from SPM removal
	When SDM.create_volume() was added we added new types to the schema in:

	commit bbbb72a192d8b54d21c8d65f6a10278404a966db

	    sdm: add SDM.create_volume API stub

	The API was never implemented and was removed in:

	commit d4b4ea395afa3a9b76bfe3eb9203cdce0f7d4e59

	    storage: Removed leftover code from SPM removal

	But the unused types were left - remove them.

2021-01-26  Vojtech Juranek  <vjuranek@redhat.com>

	tox: don't require yappi version
	Recently, we stopped requireing pytest and pytest-cov versions in
	commit

	    commit e0fdd9f4878eb82ceb077f7fe20325232329fb47
	    Author: Nir Soffer <nsoffer@redhat.com>
	    Date:   Thu Dec 10 18:59:50 2020 +0200

		tox: Use latest pytest and pytest-cov

	Don't require also yappi version, which is quite old and use the
	latest one.

	Also, this old version fails to compile e.g. on my FC 32
	and FC 33:

	      _yappi.c: In function ‘_enum_threads’:
	      _yappi.c:800:39: error: invalid use of incomplete typedef ‘PyInterpreterState’ {aka ‘struct _is’}
		800 |     for (p=PyThreadState_GET()->interp->tstate_head ; p != NULL; p = p->next) {
		    |                                       ^~
	      _yappi.c: In function ‘set_clock_type’:
	      _yappi.c:1239:20: warning: comparison of integer expressions of different signedness: ‘int’ and ‘clock_type_t’ [-Wsign-compare]
	       1239 |     if (clock_type == get_timing_clock_type())
		    |                    ^~
	      error: command 'gcc' failed with exit status 1
	      ----------------------------------------
	      ERROR: Failed building wheel for yappi
	      Running setup.py clean for yappi
	    Failed to build yappi

	automation: let combined coverage command always succeed
	In previous patch execution of teardown was fixed, but CI still fails,
	as failing coverage fails whole build:

	    22:45:21 Coverage.py warning: Couldn't read data from '/home/jenkins/workspace/vdsm_standard-check-patch/vdsm/tests/.coverage-virt': UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 106: invalid start byte
	    22:45:21 + teardown
	    22:45:21 + res=1
	    22:45:21 + '[' 1 -ne 0 ']'
	    22:45:21 + echo '*** err: 1'
	    22:45:21 *** err: 1

	Make coverage commands always succeed as if the tests pass, job should
	succeed. Coverage error is clearly visible in the logs and one should
	note it when checking summary coverage report and there's non (and if
	nobody checks coverage report, we probably don't need it:-).

2021-01-26  Nir Soffer  <nsoffer@redhat.com>

	automation: Require coverage 5
	Looks like recent failures in coverage are caused by incompatibility
	between pytest and pytest-cov and older coverage[1]. We don't specify
	coverage version, so we get some old version that happened to be
	installed on the slaves.

	Upgrade to coverage >= 5. Since this version is not packaged yet for
	Centos, and this is developer tool and not runtime requirement, install
	coverage using pip. This will make it easier to use latest pytest and
	pytest-cov.

	Ideally this would be installed by tox in a virtual environment, but we
	use coverage outside of tox to create a combined coverage report. For
	the CI environment, using pip is fine.

	[1] https://stackoverflow.com/questions/59439831/coveralls-unicodedecodeerror

2021-01-26  Vojtech Juranek  <vjuranek@redhat.com>

	automation: don't use unsafe pushd
	Currently CI fails with

	    OSError: [Errno 16] Device or resource busy: '/var/lib/mock/epel-8-x86_64-6fbd7704c310804ff7016833dcc4221d-3162111/root/var/tmp/vdsm-storage/mount.file-512'

	which is a result of failed userstorage teardown, which fails with

	    python3: can't open file 'tests/storage/userstorage.py': [Errno 2] No such file or directory

	which is related to combined coverage command failure:

	    Coverage.py warning: Couldn't read data from '/home/jenkins/workspace/vdsm_standard-check-patch/vdsm/tests/.coverage-gluster': UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 106: invalid start byte

	The automation script changes the directory to `tests` before running
	combined coverage command and after it moves back to original dir.
	However, if the command fails, it doesn't move back and following
	storage teardown fails as well as the script is in wrong directory and
	cannot find userstorage script.

	Don't use unsafe `pushd` command, but run coverage commands in subshell.
	Running `cd` in a subshell doesn't impact parent shell and therefore
	even in case of failure, tear down should work as expected.

2021-01-26  Nir Soffer  <nsoffer@redhat.com>

	tests: Shorten long lines
	Fix flake8 violation that sneaked in while CI was broken.

2021-01-25  Eyal Shenitzky  <eshenitz@redhat.com>

	constants.py: add backup scratch disk type to content type
	When a backup is taken for a VM, for each disk that
	participate in the backup a scratch disk is created.

	This patch adds this disk type to the content type
	so it will be easy to distinguish between the scratch
	that was created during a backup and regular data disks.

	Bug-Url: https://bugzilla.redhat.com/1874483

	backup.py: skip parent validation in case of missing parent_checkpoint_id
	In case of a backup for RAW disks only, checkpoint
	doesn't create and parent_checkpoint_id will be None.

	The validation for matching the given parent_checkpoint_id from the
	engine and the leaf checkpoint ID that defined on the host should be
	skipped in this case.

	Bug-Url: https://bugzilla.redhat.com/1915025

2021-01-25  Vojtech Juranek  <vjuranek@redhat.com>

	tests: convert tests from TestImageTickets into functions
	As we converted imagetickets test to pytest in previous patch, there's
	no need to group the tests in TestImageTickets class as all the tests
	are for image tickets. Remove this class and define the tests as orinary
	functions.

	tests: convert imageticket tests to pytest
	Convert module imageticket_test to pytest, add fixtures for common
	monkeypatching and remove unused imports.

	tests: convert imagetickets tests to py3
	Convert imagetickets_test module to py3:
	- remove __future__ imports
	- import http.client direcly
	- replace six.text_type() and keep only string
	- remove six dependency from

2021-01-24  Eyal Shenitzky  <eshenitz@redhat.com>

	backup.py: perform checkpoint validation when checkpoint redefine
	Instruct libvirt to perform a checkpoint validation when the checkpoint
	is redefined.

	Libvirt will validate the metadata related to the disk state of
	the redefined checkpoint.

	Bug-Url: https://bugzilla.redhat.com/1896245

	backup.py: distinct broken checkpoint error when starting a backup
	Add a distinction between inconsistent checkpoint error when starting
	a backup to all the other errors.

	This error may give more information to the engine about the type of
	the error and how to handle future backups for this VM backup chain.

	Bug-Url: https://bugzilla.redhat.com/1896245

2021-01-21  Eyal Shenitzky  <eshenitz@redhat.com>

	backup.py: allow redefine checkpoints using backup configuration
	Libvirt doesn't require now the VM domain XML when checkpoint redefines.

	VDSM can compose the checkpoint XML given the configuration
	of the backup that was taken for that checkpoint.

	This patch allows to redefine a checkpoint using one of the two ways:
	  1. By providing the checkpoint XML (as string)
	  2. By composing the checkpoint XML on the host using the backup
	     configuration.

	If both of the fields are given, the checkpoint will redefine using the
	backup configuration. If none of the fields are given the operation will
	be failed.

	The support for redefining a checkpoint using the checkpoint XML remains
	in order to support old engine that does not support sending backup
	configuration.

	Bug-Url: https://bugzilla.redhat.com/1901835

2021-01-20  Sandro Bonazzola  <sbonazzo@redhat.com>

	dracut: fix conf syntax as per man page
	As per man dracut.conf page, adding an heading and trailing space
	to omit_dracutmodules value.

	Bug-Url: https://bugzilla.redhat.com/1916947

2021-01-20  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.50.2

2021-01-19  Sandro Bonazzola  <sbonazzo@redhat.com>

	automation: align with ovirt-release
	- dropped nmstate and NetworkManager repos being CentOS 8.3 available.
	- using repo names as in jenkins mirror to allow mirror injection.

2021-01-14  Ales Musil  <amusil@redhat.com>

	virt: Fix exception when filter was missing on device update
	Upon updating vm nic without any filter it caused exception
	because the filter attribute was not defined.

	Set the filter to None by default to prevent this issue.

2021-01-13  Tomáš Golembiovský  <tgolembi@redhat.com>

	virt: sysprep: clean host name
	virt-sysprep does not remove host name or set it to any generic value.
	It only removes it from ifcfg network-scripts (which likely does nothing
	at all). To avoid new VMs re-using the original host name we need to use
	'--hostname <new-hostname>' to change it. But since sealing is done when
	the template is created we don't know new host name yet. The only sane
	option seams to be to change the host name to some generic value.

	Bug-Url: https://bugzilla.redhat.com/1860492

2021-01-13  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.50.1

2021-01-13  Ales Musil  <amusil@redhat.com>

	net, tests: Skip sysfs bonding tests on travis
	The recent failure in Travis CI uncovered interesting thing.

	Container is not able to load kernel module unless it is
	privileged to do so. So any loading of bonding module directly
	fails. But kernel is able to load the module on its own when we
	request creation of new bond however the loaded module
	has some flaws when done this way. It allows to create
	bond via sysfs or iproute2 but fails with permission
	error on modifying the bond options via sysfs.

2021-01-13  Amit Bawer  <abawer@redhat.com>

	vm: Move live merge code to livemerge module
	Add livemerge module to virt so we can maintain live merge code
	separately from the huge vm.py codebase.

	Bug-Url: https://bugzilla.redhat.com/1796415
	Bug-Url: https://bugzilla.redhat.com/1796124

	vm: Make private vm methods public for merge usage
	As preparation for separation of merge code from vm code
	we turn some of vm's private methods into public ones so they
	can be used from the merge module to come.

	Bug-Url: https://bugzilla.redhat.com/1796415
	Bug-Url: https://bugzilla.redhat.com/1796124

	vm: Remove confLock from blockJobs sections
	The confLock is used for vm conf members while jobsLock is for
	blockJobs dict access, which is already locked when calling
	untrack and track block jobs methods.

	This is a leftover from earlier code stages, when the block jobs were
	maintained under the same conf dict of the vm:

	commit ad047ca551b3e7594a41f08a641d00df91645c26
	Author: Adam Litke <alitke@redhat.com>
	Date:   Thu Mar 20 17:02:03 2014 -0400

	    LiveMerge: Add block job info to VM stats

	Since we want to separate merge code from other vm code, we need to
	remove the unrequired confLock usage first for blockJobs.

	Also turn blockJob methods into private ones as they are only used
	from within merge code and minor refactor untrack_block_job for dict key
	error and unused returned value.

	Bug-Url: https://bugzilla.redhat.com/1796415
	Bug-Url: https://bugzilla.redhat.com/1796124

2021-01-13  Ales Musil  <amusil@redhat.com>

	net, ci: Add substage that runs functional tests with ovs
	The condition to trigger those tests is same as for
	functional tests on linux bridge.

	net, tests: Replace hardcoded values with proper nmstate schema
	Some of the tests were using hardcoded values instead of schema
	classes from nmstate. This can lead to breakage of tests upon
	schema change.

	net, nmstate: Use Vlan schema instead of hardcoded values

	net, ovs: Add MTU support for OvS networks
	MTU on OvS is due to it's nature enforced on the
	the network itself and it's base interface. The OvS
	bridge adjusts it's MTU based on ports. If there
	are multiple networks over single interface with
	different MTU values, the MTU applied to the base
	interface is highest of all networks.

2021-01-12  Amit Bawer  <abawer@redhat.com>

	merge_test: Add live merge errors tests
	Add tests for merge invocation errors:

	- blockCommit unrecoverable error.
	- commit block job which already exists.
	- raw base volume too small.

	Bug-Url: https://bugzilla.redhat.com/1796415
	Bug-Url: https://bugzilla.redhat.com/1796124

2021-01-10  Amit Bawer  <abawer@redhat.com>

	merge_test: Add block commit cancellation test
	Test scenario for manual cancellation of a block commit job
	in progress.

	Bug-Url: https://bugzilla.redhat.com/1796415
	Bug-Url: https://bugzilla.redhat.com/1796124

	merge_test: Add internal live merge test
	Internal merge does not have commit_ready stage like active merge.

	In this case:

	1) The block job completes automatically on qemu side,
	   switching to the new layer.
	2) libvirt sends an event.
	3) libvirt stops reporting the job.
	4) libvirt updates the xml.

	The code triggers cleanup when the job is gone without a pivot,
	and we do not wait for xml update.

	Bug-Url: https://bugzilla.redhat.com/1796415
	Bug-Url: https://bugzilla.redhat.com/1796124

	merge_test: Add active live merge test
	Add a baseline test for live merge flow.

	Bug-Url: https://bugzilla.redhat.com/1796415
	Bug-Url: https://bugzilla.redhat.com/1796124

	merge_test: Explicitly import virt.vm objects
	This will avoid name clashing on next to follow live merge tests.

	Bug-Url: https://bugzilla.redhat.com/1796415
	Bug-Url: https://bugzilla.redhat.com/1796124

	vmfakelib: Update IRS methods for livemerge tests
	- Set getVolumeSize() to work with prepared volumes
	  and set the existing device_test to work accordingly
	  for a consolidated usage.

	- Add sendExtendMsg() API to test volume extension
	  calls during live merge tests.

	Bug-Url: https://bugzilla.redhat.com/1796415
	Bug-Url: https://bugzilla.redhat.com/1796124

2021-01-07  Roman Bednar  <rbednar@redhat.com>

	asyncutils: fix a docstring typo
	Fixes a typo in function docstring.

2021-01-07  Ales Musil  <amusil@redhat.com>

	net, tests: Fix firewall service check
	Removal of initscripts package accidentally broke
	few tests. This was caused because old way to check
	if service is running was used. Replace the service
	call with proper systemctl call.

2021-01-06  Tomáš Golembiovský  <tgolembi@redhat.com>

	virt: check for virt-* commands during runtime
	It is not a best idea to check for existence of the virt-* commands
	during VDSM startup. When the commands are missing it is certainly bad,
	as it means some of the VDSM dependencies were not properly installed,
	but it should not prevent VDSM from starting up. Hence we drop the check
	and let the invocations fail only when the functionality is actually
	needed.

	While at it drop the use of CommandPath as it does not have any value
	when we look only for one fixed path.

2021-01-06  Nir Soffer  <nsoffer@redhat.com>

	asyncevent: Don't report callback=None
	When callback is None (handle already called) don't report
	callback=None.

	asynevent: Fix __repr__ formatting bug
	Fix the common bug formatting a tuple with % by switching to
	str.format().

	tests: Add missing tests for asyncevent
	The internal Handle and Timer classes had no tests. Add the missing
	tests, revealing bug in Handle.__repr__.

2021-01-06  Jean-Louis Dupond  <jean-louis@dupond.be>

	virt: Update filter parameters live
	Allow filter and filterref parameters to be updated live.
	This happens when a filterrule is added/removed/updated.

	Bug-Url: https://bugzilla.redhat.com/1899583

2021-01-06  Ales Musil  <amusil@redhat.com>

	net, tests: Use latest pytest in containers
	Container for functional tests are using latest pytest
	it should be the same in integration and unit containers.

	net: Remove nmstate workaround for DNS and route metadata
	nmstate 0.3 does not require this workaround as it is
	able to keep the metadata by itself.

	net: Update spec file and remove unused packages
	Some of the packages are not required by network
	package anymore and can be removed.

	net: Remove dhclient module
	Remove dhclient module and all remaining references.

2021-01-05  Nir Soffer  <nsoffer@redhat.com>

	tox: Use latest pytest and pytest-cov
	Now that we support only python 3, there is no need to use ancient
	pytest version. Lets try to use latest version if it works.

	Using new tox reveal a stupid bug in Handle.__repr__ - not sure why:

	>   info.append("args=%s" % self._args)
	E   TypeError: not all arguments converted during string formatting

	Mark the failed test as xfail for now.

	tests: Fix asserts about raised errors
	When testing raised errors, we should check exception info value.
	Issue was revealed by upgrading to latest pytest version.

2020-12-17  Milan Zamazal  <mzamazal@redhat.com>

	virt: Fix the check for clientIp in migration
	In commit 01687aa242e8cb511d2ecce8199768e44e41feac,
	Vm.conf['clientIp'] value was moved to a Vm attribute, but the change
	has forgotten to update the lookup in migration.py.  That means the
	graphics ticket expiration is never updated on migrations and
	remote-viewer must be restarted after each migration.

	This patch fixes the refactoring omission.

	Bug-Url: https://bugzilla.redhat.com/1773922

2020-12-16  Ales Musil  <amusil@redhat.com>

	net, tests: Remove obsolete dynamic tests
	Tests that are testing if dhclient is stopped are
	obsolote as this has moved to nmstate.

	net, tests: Introduce Bond helper class
	The Bond helper follows the same implementation
	as any other device helper for tests. In order to
	keep the link_bond tests working the old
	bond_device was renamed to bond_device_link.

	The bond_device_link is going to be removed once
	the link.bond dependencies are resolved.

	net, tests: Avoid direct usage of ipwrapper for setup
	If we want to switch to NM for managing test interfaces
	we should avoid direct usage of iproute in those interface
	and let it be managed by common interface.

	net, tests: Introduce VethPair helper class

	net, tests: Bring dummy device up by default
	All helpers are bringing up the devices, align
	this behavior also for dummy device helpers.

	net, tests: Refactor Interface helper classes
	Add common code to Interface helper classes and
	refactor uncommon methods to use snake case.

	net, tests: Use common return value from device helpers
	Return value of *_device from nettestlib was
	inconsistent some of them were returning names
	and some of them entire class. Make it consistent
	so all them return device name.

	One exception is bond_device which is handled
	differently.

2020-12-15  Ales Musil  <amusil@redhat.com>

	net, tests: Move bridge to nettestlib
	nettestlib should be common place for various interface
	classes.

	net: Remove fc30 jobs for func tests
	fc30 nodes are not available anymore.
	Keeping the check-network for the time being
	on fc30 as we don't have anything else where
	it could run.

2020-12-09  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.40

	virt: Add TPM hash to running VM stats
	It allows Engine to detect TPM data changes.  Engine can then request
	the updated TPM data using VM.getExternalData call.

	If there is no TPM in the VM or there is no TPM data, the item is not
	added to the stats.

	virt: New verb VM.getExternalData
	It serves to transfer TPM data to Engine.  In future, it may serve to
	transfer other similar data, such as secure boot data.

	The verb must be flexible enough to serve not only for transferring
	different kinds of data (TPM, secure boot) but also to transfer it
	under different circumstances and reliability/efficiency requirements.
	TPM data is retrieved and sent unconditionally after each request by
	default.  This is to support the first use case: Sending the final
	data when the VM is down, before it is (together with its data)
	undefined by libvirt.  Future Engine patches may utilize more relaxed
	approaches allowing periodic monitoring and retrieving and
	transferring the data only when needed.

	It is important to handle errors properly.  Otherwise Engine may fail
	to retrieve the data or it may fail to destroy the VM.

	virt: Monitor TPM data
	This patch introduces periodic reading TPM data.  Data is considered
	stable if its reported hash doesn't change in two last consecutive
	calls.  Only in such a case, or if forced reading is requested,
	typically after the VM is down, the data reported to Engine is
	updated.  This is to protect against data changes performed at the
	same time the data is read.

	We compute a cryptographic hash of the data for Engine, not to leak
	any information about TPM data in logs etc.

	Reporting the read TPM data to Engine will be addressed in followup
	patches.

	virt: Write TPM data obtained from Engine
	When a VM with a TPM device is started, Engine should provide TPM
	data.  If the data is not provided, typically for new VMs or when a
	TPM device is added to the VM for the first time, initial data is
	created automatically by swtpm.

	This patch adds support for a new VM.create parameter that contains
	TPM data to start the VM with.  Support for reading TPM data and
	reporting it back to Engine will be added in followup patches.

	common: Allow protection of fields other than "password"
	Not every sensitive piece of information can be reasonably named
	"password".  For instance, TPM data, secure boot data, or their hashes
	are all pieces of information that shouldn't be logged.

	Include into the password protection all items with keys starting with
	"_X_" so that any piece of data can be protected from logging.

	virt: Support for reading and writing TPM data
	TPM is much more useful with TPM data persistence than without it.
	TPM emulation is handled by swtpm.  swtpm stores its data under
	libvirt directory for each running VM.  It uses data already present
	when the VM starts or new data is created if it's not present.  The
	data is deleted when a VM domain is undefined.  This patch introduces
	basic functions for reading and writing the TPM data for a given VM.

	There are important security considerations.  Supervdsm must be used
	to access TPM data because access to it is restricted.  Any user who
	can access supervdsmd or vdsmd can read and write the data, which is
	inherently needed to be able to transfer the data to or from Engine.
	We restrict supervdsm access to swtpm subdirectories, with properly
	formatted UUID names, to prevent changes outside it by calling Vdsm
	APIs.  We must also be cautious not to leak parts of TPM data when
	logging exceptions.  Although untrusted persons shouldn't have any
	access to hosts, there can still be QEMU bugs etc. and it's good to
	have basic restrictions.

	More TPM data processing functionality will be added, especially
	interaction with Engine, in followup patches.

	virt: Add support for local VM data storage
	There are features, such as TPM devices or secure boot, that require
	passing VM data between Vdsm and Engine.  The data must be stored in
	the local file system where it is read and written by virtualization
	tools.  Data can change as the VM runs and the changed data must be
	reported to Engine.

	This patch introduces support for such data by providing facilities
	for data reading, writing, ASCII encoding and decoding, and watching
	for changes.

	Note that we use system tar rather than Python native shutil package,
	for two reasons:

	- Extracting tar files using shutil is insecure and can modify data
	  outside the target directory.

	- tar can read and write archive data using pipes.

2020-12-09  Nir Soffer  <nsoffer@redhat.com>

	tool: sanlock: Speed up add_lockspace
	By configuring sanlock properly, we can speed up concurrent
	add_loskspace significantly:

	- our_host_name - If not configured, sanlock generates a new UUID on
	  each restart.  Using constant host name, recovery from unclean
	  shutdown is 3 times faster. Using the host hardware id will make it
	  easier to detect which host is related to sanlock issues.

	- max_worker_threads - If not configured, sanlock uses 8 worker threads,
	  limiting concurrent add_lockspace calls. Using 40 worker threads,
	  activating a host with 40 storage domains is 3 times faster. We use 50
	  worker threads to optimize for 50 storage domains.

	NOTE: max_worker_threads is a new option that will be available in RHEL
	8.3.z. Adding this option now has no effect, but once the option will
	be available, users will enjoy the improvement without waiting for oVirt
	update.

	Bug-Url: https://bugzilla.redhat.com/1903358
	Bug-Url: https://bugzilla.redhat.com/1508098

	sanlockconf: Add helper for working with sanlock configuration
	Sanlock configuration is the simplest possible configuration file
	format. Add simple module reading and writing this format.

	Example usage:

	    >>> conf = sanlockconf.load()
	    >>> conf
	    {'max_worker_threads': '50'}
	    >>> conf['max_worker_threads'] = 'c59d39ca-620b-4aad-8b50-97833e366664'
	    >>> sunlockconf.dump(conf)

	We will use this module to configure sanlock and check that sanlock is
	configured in vdsm init phase.

	tool: sanlock: Prepare for configuring sanlock.conf
	Clean up the messy isconfigured() by extracting small and simple
	functions for checking if sanlock groups are configured, and if sanlock
	needs to be restarted to apply groups changes.

	Refactor isconfigured() and configure() to make it easy to check and
	configure sanlock configuration file.

	Add common logging when checking and configuring sanlock, matching other
	configurators.

	tool: Use fileUtils.backup_file()
	Use fileutils.backup_file() in multipath and lvm configurators, removing
	duplicate code and unifying logging.

	fileUtils: Add backup_file helper
	When modifying configuration files, we backup the original file with a
	timestamp. We have 2 copies of this code and it for backing up sanlock
	configuration. Add a helper to avoid the duplication and have uniform
	backup behavior.

	tool: Use fileUtils.atomic_write()
	Use fileUtils.atomic_write() in mpathconf, multipath configurator, and
	lvm configurator. This remove duplicate code and fix selinux labels for
	the modified files.

	fileUtils: Add atomic_write helper
	We have duplicate versions of this code in mpathconf and multipath
	configurator, and we need it now for sanlockconf module. Add a generic
	helper function and some tests.

2020-12-08  Nir Soffer  <nsoffer@redhat.com>

	properties: Remove python 2 leftovers
	Remove unneeded six hacks and simplify super() usage.

	properties: Fix wrong class name
	When naming class properties, we shadowed the name argument, overriding
	the __class__.__name__ to "__init__" in all sub classes.

2020-12-08  Eyal Shenitzky  <eshenitz@redhat.com>

	sdm.add_bitmap.py: validate bitmaps doesn't exist before adding it
	Validate that the bitmap doesn't exist in the chain before adding it.

	bitmaps.py: succeeds to remove a bitmap if it doesn't exist
	In case of a non-existing bitmap that was asked to be removed,
	the call for removing it should succeed and not raise an
	exception that will cause the request from the engine to fail.

2020-12-07  Eyal Shenitzky  <eshenitz@redhat.com>

	backup.py: set stop_backup as a success if VM is not defined
	If VM.stop_backup called for an undefined VM (can happen if the VM
	powered-off for the guest for example), stopping the backup will fail
	and the scratch disks that were created for that backup will not be
	removed.

	This patch handles this by detecting the case when the VM is done cleanup
	the scratch disks and handle the operation as a success.

	Bug-Url: https://bugzilla.redhat.com/1900518

	backup_tests.py: add tests for require_consistency
	Bug-Url: https://bugzilla.redhat.com/1894413

	backup.py: add require_consistency flag to BackupConfig
	Adding the new require_consistency flag will let the backup app
	to choose if the backup will fail if the VM failed to freeze or not.

	If the VM failed to freeze it might lead to inconsistent data on the
	backup so in order to prevent it require_consistency should be set to
	'True'.

	The default value for require_consistency flag is 'False'.

	Bug-Url: https://bugzilla.redhat.com/1894413

2020-12-07  Tomáš Golembiovský  <tgolembi@redhat.com>

	virt: create logging runner for virt-* commands
	Bug-Url: https://bugzilla.redhat.com/1895843

	virt: utils: add run_logging()
	It is useful to store log file of standard output/error of certain
	commands in a log for later reference. Such log can be valuable when
	debugging errors especially in situation when the command assumes
	success.

	The caller can chose whether to store output, error or both. Log files
	are stored in /var/log/vdsm/commands and are kept for 30 days. File name
	is <command>-<date_time>-<random_string>.log. Caller may optionally
	specify a tag (e.g. resource ID) to better distinguish the logs. In this
	case the file name is <command>-<tag>-<date_time>-<random_string>.log.

	Bug-Url: https://bugzilla.redhat.com/1895843

2020-12-07  Liran Rotenberg  <lrotenbe@redhat.com>

	vmdevices: handle mapping of luns with image id
	LUNs were not set with serial. This caused to libvirt auto-set it as the
	disk alias. We changed it into the LUN ID, but this harms backwards
	compatibility. Therefore, we now aligned with non-LUN disks, having the
	serial set with the image ID. The only different case is LUN with
	pass-through, where we can't set the serial and it will exist as the LUN
	ID. As part of this change, the mapping of LUNs need to handle the new
	serial set by the engine, in order to provide the logical name of that
	device.

	Bug-Url: https://bugzilla.redhat.com/1904774

2020-12-07  Ales Musil  <amusil@redhat.com>

	net, virt: Move python3-dbus to common package
	Move requirement for python dbus library to
	vdsm-common package. It is no longer used by
	network and only reference is in common.

	net: Remove old OvS
	Remove old OvS code in favor of nmstate OvS that
	should provide feature parity with linux bridge.

	virt: Remove direct connection of OvS bridge to VM
	The behavior was relying on "old" OvS code. At the same time
	engine and XML generated by it does not support direct
	connection of OvS bridge.

	net: Remove DPDK
	DPDK is experimental and unmaintained in vdsm for
	quite a while. Remove all references to DPDK.

	Bug-Url: https://bugzilla.redhat.com/1899865

	virt: Remove dpdk support from VM networking
	Because dpdk is no longer maintained and experimental
	it can be removed from VM networking.

	Bug-Url: https://bugzilla.redhat.com/1899865

	net, ovs: Use detection from RunningConfig in netinfo
	Rather than computing netinfo for OvS every time
	openvswitch service is running, use RunningConfig
	to detect OvS.

	openvswitch service should run always even on hosts
	that use linux bridge. This results in unnecessary
	call to OvsInfo every time netinfo is called.

	net: Update _set_bond_type_by_usage
	Remove reference to old OvS code and use running
	config passed instead of creating a new one.

2020-12-06  Nir Soffer  <nsoffer@redhat.com>

	readme: Update git URL
	Recent gerrit update broke the strange git URL. Replace with a simpler
	and working one.

2020-12-02  Nir Soffer  <nsoffer@redhat.com>

	doc: Document I/O timeouts configuration
	Add document about I/O timeouts configuration. The document needs to be
	updated when we do more testing with non-default I/O timeouts, and when
	we get more info and recommended configuration from storage vendors
	using the new configuration.

	Preview of the rendered document:
	https://github.com/nirs/vdsm/blob/io-timeout/doc/io-timeouts.md

	multipath: Match no_path_retry to sanlock:io_timeout
	Current setting (4 retires) queues I/O for 20 seconds when all paths
	fail. This is too short causing VMs to pause to quickly during short
	storage outage, and is not playing well with sanlock lease renewal
	timeout (80 seconds).

	Change the value to match sanlock default renewal timeout. If users want
	to increase sanlock:io_timeout, they need to increase also no_path_retry
	value.

	For example, to configure the system for storage "foobar" that needs 120
	seconds timeout during failover, users need to add install this drop-in
	file on all hosts:

	    # /etc/multipath/conf.d/foobar.conf
	    # Configuration for foobar storage.
	    overrides {
	        # Queue I/O for 120 seconds when all paths fail
	        # (no_path_retry * polling_interval == 120).
	        no_path_retry 24
	    }

	To apply the new configuration you need to reload multipathd service:

	    systemctl reload multipathd

	This change increase the time vdsm commands are blocked on inaccessible
	storage. This may cause timeouts and delays in unrelated storage, in
	particular when using large number of storage domains, so it requires
	lot of testing.

	clusterlock: Move initSANLock to SANLock class
	The only caller is SANLock.initLock(), but the implementation was a
	function in the module, using its own logger. Move the function to the
	proper place and remove the unneeded and badly named logger.

	config: Configurable sanlock io_timeout
	Allow configuring sanlock I/O timeout, which is the base for all sanlock
	timeouts[1].  Using larger value makes VMs more resilient to short
	storage outage, but increase the time to failover a VM and to acquire a
	host id.

	Add new sanlock:io_timeout option, using sanlock default (10 seconds).
	Increasing this value needs more testing and may require additional
	configuration on engine side.

	Using this configuration storage vendors can tune the system to work
	better with storage servers that cannot work with current sanlock
	renewal timeout (8 * io_timeout, 80 seconds).

	Multipath no_path_retry must match sanlock renewal timeout. To modify
	this sanlock:io_timeout user should also modify multipath
	defaults/no_path_retry and overrides/no_path_retry.

	During upgrades, we may have mix of old hosts using the old default
	io_timeout, and new hosts using possibly larger io_timeout. Sanlock
	supports hosts using different I/O timeout in the same lockspace.

	To configure sanlock I/O timeout for storage "foobar" that needs 120
	seconds timeout during failover, you need to install this drop-in
	configuration file all all hosts:

	    # /etc/vdsm/vdsm.conf.d/99-foobar.conf
	    # Configuration for foobar storage.
	    [sanlock]
	    # Set renewal timeout to 120 seconds
	    # (8 * io_timeout == 120).
	    io_timeout = 15

	To apply the new configuration you need to restart vdsmd service, move
	the host to maintenance, and activate the host.

	[1] https://pagure.io/sanlock/raw/master/f/src/timeouts.h

2020-12-02  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.39

2020-12-02  Vojtech Juranek  <vjuranek@redhat.com>

	tests: normalize XML strings before asserts
	Use normalized() function before comparing XML strings instead of
	indented() function to avoid test failures on Python 3.8 and later,
	which doesn't sort element attributes.

	The patch includes the following changes:
	- move indented() function into testlib as this function is used only by
	  tests
	- add option (turned on by default) to sort attributes lexically
	- rename the function to normalized() to reflect its functionality
	- use normalized() instead of indented() in tests.

	xml: provide function for ordering XML attributes
	Prior python 3.8, XML attributes were re-oreded in lexical order when
	parsed. This has changed in python 3.8 (see [1]) and attributes are
	kept in the same order as provided by user.

	Example in Python 3.6.8:

	    Python 3.6.8 (default, Apr 16 2020, 01:36:27)
	    [GCC 8.3.1 20191121 (Red Hat 8.3.1-5)] on linux
	    Type "help", "copyright", "credits" or "license" for more information.
	    >>> import xml.etree.ElementTree as ET
	    >>> ET.tostring(ET.fromstring("<test arg2='2' arg1='1' />"))
	    b'<test arg1="1" arg2="2" />'

	Same in Python 3.8.6:

	    Python 3.8.6 (default, Sep 25 2020, 00:00:00)
	    [GCC 10.2.1 20200723 (Red Hat 10.2.1-1)] on linux
	    Type "help", "copyright", "credits" or "license" for more information.
	    >>> import xml.etree.ElementTree as ET
	    >>> ET.tostring(ET.fromstring("<test arg2='2' arg1='1' />"))
	    b'<test arg2="2" arg1="1" />'

	Add a function which provides ordering of attributes and thus can mimic
	old behaviour. This is useful mainly for the tests which rely on
	automatic ordering of attributes and now fail with python 3.8. So we
	either have to order attributes in lexical order or adjust all XML
	samples which we use for test asserts.

	As shown above, importing as re-exporting XML via ElementTree doesn't
	work and minidom doesn't work as well:

	    Python 3.8.6 (default, Sep 25 2020, 00:00:00)
	    [GCC 10.2.1 20200723 (Red Hat 10.2.1-1)] on linux
	    Type "help", "copyright", "credits" or "license" for more information.
	    >>> from xml.dom.minidom import parseString
	    >>> parseString("<test arg2='2' arg1='1' />").toxml()
	    '<?xml version="1.0" ?><test arg2="2" arg1="1"/>'

	Unless I missed some functionality/option which enables backward
	compatible lexical ordering, custom function for attribute ordering
	seems to be the only way how to do it.

	[1] https://bugs.python.org/issue34160

2020-12-01  Nir Soffer  <nsoffer@redhat.com>

	clusterlock: Move host id logs to clusterlock
	Remove duplicate log about acquiring the host id from the monitor, since
	we already log this event in clusterlock.

	Move the warning about losing the host id to the clusterlock. This is a
	better place to log this since the cluster lock keep the state of the
	host id.

	clusterlock: Time async add_lockspace
	add_lockspace() may be called with wait=False, starting async operation.
	In this case the stopwatch time is not useful for measuring the time to
	acquire the host id.

	We may detect the completion when calling add_lockspace() with wait=True
	by another thread, or when calling inq_lockspace(), typically from the
	domain monitor thread.

	Replace the stopwatch with helpers to measure and log the time to
	acquire the host id, handling both async and sync calls.

	Here are example logs recorded when a host is activated:

	2020-11-26 00:11:55,467+0200 INFO  (monitor/06e43bf) [storage.SANLock]
	Host id 1 for domain 06e43bfc-2ffe-419e-843b-59a5d23165e1 acquired in 30
	seconds (clusterlock:407)

	2020-11-26 00:12:04,489+0200 INFO  (monitor/a87b183) [storage.SANLock]
	Host id 1 for domain a87b183d-df2d-4de3-b370-507c6261c328 acquired in 30
	seconds (clusterlock:407)

	contrib: Add script for blocking storage
	When testing sanlock and other storage negative flows, it is useful to
	temporary block access to a storage server. Add a script to make this
	easy.

	Examples

	- Blocking all outgoing traffic to server my.storage until the script is
	  interrupted:

	  $ sudo contrib/block my.server

	- Blocking outgoing traffic to NFS server my.storage for 60 seconds:

	  $ sudo contrib/block --port 2049 --duration 60 my.storage

2020-12-01  Tomáš Golembiovský  <tgolembi@redhat.com>

	vm: remember UP state after power up
	When the VM starts and we notice that the QEMU Guest Agent is up we
	consider the VM in UP state. This state is however not remembered in the
	VM object which holds a separate state status. If the agent disappears
	or fails for some reason shortly after the boot the UP state is not
	remembered and VM goes back into POWERING_UP until the time out is
	reached and the VM is considered "definitely UP".

	This behavior is bad and once we proclaim the VM as UP it should stay up
	and not fall back to POWERING_UP. This is solved by remembering the
	state in the VM object. Ideally we should not keep two or three
	different pieces of information about the VM state scattered throughout
	the code, but that calls for a larger refactoring.

	The issue is easily triggered when backup is invoked immediately when
	the VM gets to UP state. This calls fsfreeze in the guest which disables
	some of the commands in the agent (like guest-get-users). When we try to
	call such commands it results in a failure and we don't consider the
	agent in good state for some time. During this window the state from
	agent is not considered valid which results in the fallback to VM
	internal state (which can still be POWERING_UP). While our broken
	handling of agent after fsfreeze is normally not an issue (when done
	later during VM lifetime) it should also be fixed, but that is for a
	separate patch too.

	Bug-Url: https://bugzilla.redhat.com/1893656

2020-12-01  Ales Musil  <amusil@redhat.com>

	net, tests: Remove compat module
	With python2 being unspported we don't need
	the compat module anymore.

	net, tests: Clear unused leftovers after legacy removal
	Some parts of the code remained after legacy removal
	but are unused. This was detected by use of vulture.

	net: Clear unused leftovers after legacy removal
	Some parts of the code remained after legacy removal
	but are unused. This was detected by use of vulture.

2020-11-30  Eyal Shenitzky  <eshenitz@redhat.com>

	volume.py: introduce add/remove bitmap for a volume
	New SDM.add_bitmap will be used to start an offline cold backup
	for a volume that is part of a powered-off VM.

	The new add_bitmap API will add a bitmap to the given volume,
	this dirty bitmap will be exposed later via qemu-nbd server.

	Request for SDM.add_bitmap
	{
	  'job_id': job-uuid,
	  'vol_info':
	    {
	      'sd_id': sd-uuid,
	      'img_id': image-uuid,
	      'vol_id': volume-uuid,
	      'generation': 1
	    }
	  'bitmap': 'bitmap1'
	}

	The new remove_bitmap API will remove a bitmap from the given volume.
	This is currently not used by the engine but may be needed in
	'vdsm-client' tool.

	Request for SDM.remove_bitmap -
	{
	  'job_id': job-uuid,
	  'vol_info':
	    {
	      'sd_id': sd-uuid,
	      'img_id': image-uuid,
	      'vol_id': volume-uuid,
	      'generation': 1
	    }
	  'bitmap': 'bitmap1'
	}

	Bug-Url: https://bugzilla.redhat.com/1891470

2020-11-30  Ales Musil  <amusil@redhat.com>

	net, nmstate: Remove try/catch from schema import
	nmstate is required dependency of vdsm and should be included
	every time. If there is any issue with nmstate schema
	we should fail nevertheless.

	net, tests: Install nmstate in unit test container
	Because nmstate is required dependency of vdsm it should
	be installed in all relevant tests environments. This
	will allow us to use the nmstate schema directly possibly
	detect some inconsistencies between versions.

2020-11-25  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.38

2020-11-23  Vojtech Juranek  <vjuranek@redhat.com>

	tests: fix metadata_storage_test
	Running metadata_storage_test with python 3.8, many tests fail with

	    E   AssertionError: '<?xm[53 chars]vice devtype="disk" name="vda">\n        <devi[689 chars]m>\n' != '<?xm[53 chars]vice name="vda" devtype="disk">\n        <devi[689 chars]m>\n'
	    E     <?xml version='1.0' encoding='utf-8'?>
	    E     <vm>
	    E   -     <device devtype="disk" name="vda">
	    E   +     <device name="vda" devtype="disk">

	as the identifying attrs is a dict with keys "devtype" and "name" (in
	this order), in this cases specified in
	_get_drive_conf_identifying_attrs(), while all expected XML samples use
	ordering "name" and "devtype". Fix the expected XML samples to use
	correct ordering.

	Tests are passing on CI as it used older python version. Starting python
	3.8, XML attributes are kept in the same order as provided by user,
	while in previous versions attributes were orders lexically, see [1] for
	more details.

	[1] https://bugs.python.org/issue34160

2020-11-22  Tomáš Golembiovský  <tgolembi@redhat.com>

	virt: fix memory reporting when NVDIMM is attached
	Memory of the VM is reported incorrectly when NVDIMM is attached to the
	VM. This stems from the fact that <memory> element in libvirt XML
	includes not only physical memory but also size of all NVDIMMs. To make
	the situation more complicated, the initial XML delivered from Engine
	does not include NVDIMMs but only physical memory.

	While it is not possible to migrate VM with NVDIMM at the moment the
	patch tries to be future-proof in this aspect. If the VM is migrated
	then we assume that the memory in source XML already includes the NVDIMM
	sizes.

	Bug-Url: https://bugzilla.redhat.com/1893773

	virt: add xml source argument and property to domain descriptor
	The `initial=True` covered the case of initial VM start as well as
	migration where XML source is libvirt on source host. Instead of just
	having a one flag it is better to separate these two cases and thus keep
	more information about the source of the XML. That way we can
	distinguish if the initial XML comes from Engine or from migration
	source. DomainDescriptor does not really care about this distinction,
	but there is a follow-up patch where we need to handle only VM start and
	not migrations.

	Bug-Url: https://bugzilla.redhat.com/1893773

2020-11-22  Dan Kenigsberg  <danken@redhat.com>

	spec: simplify device-mapper-multipath stanza
	Our single supported platform (el8) has modern-enough
	device-mapper-multipath-0.8.3, so there is no need for years-old
	complexity and its adjoining explanations.

2020-11-19  Milan Zamazal  <mzamazal@redhat.com>

	tests: virt: Re-enable vm_libvirt_hook_test.py

	tests: Convert virt tests to pytest

2020-11-19  Liran Rotenberg  <lrotenbe@redhat.com>

	vmdevices: fix lun mapping
	Without pass-through to the LUN disk, the serial will be partial to the
	GUID provided for it. That scenario applies when the engine will
	provide the serial for such disk, otherwise the disk won't have any
	serial or a one created automatically.

	Bug-Url: https://bugzilla.redhat.com/1859092

2020-11-19  Nir Soffer  <nsoffer@redhat.com>

	qemuimg: Add backing chain support to info()
	Modern qemu-img info supports --backing-chain, returning info for entire
	chain in one call. Add backing_chain argument adding this flag.

	Here is example usage:

	>>> qemuimg.info("top.qcow2", backing_chain=True)
	[
	    {
	        "backing-filename-format": "qcow2",
	        "virtual-size": 107374182400,
	        "filename": "top.qcow2",
	        "cluster-size": 65536,
	        "format": "qcow2",
	        "actual-size": 208896,
	        "format-specific": {
	            "type": "qcow2",
	            "data": {
	                "compat": "1.1",
	                "compression-type": "zlib",
	                "lazy-refcounts": false,
	                "bitmaps": [
	                    {
	                        "flags": [
	                            "auto"
	                        ],
	                        "name": "b0",
	                        "granularity": 65536
	                    }
	                ],
	                "refcount-bits": 16,
	                "corrupt": false
	            }
	        },
	        "full-backing-filename": "base.qcow2",
	        "backing-filename": "base.qcow2",
	        "dirty-flag": false
	    },
	    {
	        "virtual-size": 107374182400,
	        "filename": "base.qcow2",
	        "cluster-size": 65536,
	        "format": "qcow2",
	        "actual-size": 200704,
	        "format-specific": {
	            "type": "qcow2",
	            "data": {
	                "compat": "1.1",
	                "compression-type": "zlib",
	                "lazy-refcounts": false,
	                "refcount-bits": 16,
	                "corrupt": false
	            }
	        },
	        "dirty-flag": false
	    }
	]

	We will use this for validating bitmaps in the entire chain. It can also
	be useful for synchronizing vdsm metadata with qcow2 metadata and for
	logging actual image chain.

	qemuimg: Return qemu-img info output as is
	We used to report only some of the info returned by qemu-img info,
	change the names, and move some format specific data to the top level of
	the dict.

	In general, modifying the output of the tool in a tool wrapper is a bad
	idea. Instead of reusing the knowledge of the real tool when working
	with the wrapper, now we need to work with 2 different formats. This
	makes no sense.

	Moving format-specific data to top level is useful, saving some typing
	in code accessing these keys, but it does not worth the effort of
	maintaining the wrapper and modifying it each time a new key is needed.

	Now we return the actual output from qemu-img info as is.

2020-11-18  Dan Kenigsberg  <danken@redhat.com>

	virt.sampling: rename local variables
	"doms" is set to the list of responsive doms, in case bulk sampling
	failed. This patch renames the variable so its purpose is clearer and no
	longer requires a comment to explain it.

2020-11-18  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.37

2020-11-18  Ales Musil  <amusil@redhat.com>

	net, ovs: Configure OvS bridge mappings
	The OvS bridge mappings are used for OVN
	networks that are connected to the underlaying
	"physical" network.

2020-11-17  Nir Soffer  <nsoffer@redhat.com>

	tests: Remove ovirt-imageio-client version
	We have the ovirt-imageio-preview repo so we get the latest version from
	this repo.

2020-11-16  Milan Zamazal  <mzamazal@redhat.com>

	tests: virt: TestVmXmlHelpers.test_pretty_format_timing test

	tests: virt: Remove py2 markers
	Python 2 is not supported anymore.

	tests: Update README
	There is some outdated information there, let's make it up-to-date.

2020-11-12  Ehud Yonasi  <eyonasi@redhat.com>

	nmstate: run podman in rhel8.
	In order to run the jobs with the respective rhel8 server,
	   we need to specify in the runtime requirements that the host
	   will be the same or newer as the distro we are running on in order
	   that PSI will choose the right image.
	   Also I've changed the mounts to be only for fc30 as we don't run
	   mock in PSI for vms.

2020-11-11  Nir Soffer  <nsoffer@redhat.com>

	docs: Update iscsi-server documentation
	Update the document for the new unified target script, and improve
	examples and formatting.

	For rendered document see:
	https://github.com/nirs/vdsm/blob/target/doc/iscsi-server-setup.md

	contrib: Merge target tools to single command
	Replace contrib/target-tools/{create,delete}-target scripts with single
	script: contrib/target.

	This eliminates lot of duplication in arguments parsing and hard coded
	layout.

	Here is the new command line:

	$ contrib/target -h
	usage: target [-h] [-s LUN_SIZE] [-n LUN_COUNT] [-r ROOT_DIR] [-i IQN_BASE] [--cache]
	              {create,delete} target_name

	Manage iSCSI targets

	positional arguments:
	  {create,delete}       Action to take.
	  target_name           Target name.

	optional arguments:
	  -h, --help            show this help message and exit
	  -s LUN_SIZE, --lun-size LUN_SIZE
	                        LUN size in GiB (default 100).
	  -n LUN_COUNT, --lun-count LUN_COUNT
	                        Number of LUNs (default 10).
	  -r ROOT_DIR, --root-dir ROOT_DIR
	                        Root directory (default /target).
	  -i IQN_BASE, --iqn-base IQN_BASE
	                        IQN base name (default iqn.2003-01.org).
	  --cache               Enable write cache. Enabling write cache improves performance
	                        but increases the chance of data loss. May cause trouble when
	                        running many services on the same server, but works fine if
	                        server is used only for storage. (default False).

	To create new target:

	    contrib/target create my-target

	Te delete the target:

	    contrib/target delete my-target

	target-tools: Replace optparse with argparse
	optparse was deprecated since 2.7. argparse is nicer and needs little
	less boilerplate.

	create-target: Add option to enable cache
	Using write_back=true when creating a fileio backstore, or setting the
	emulate_write_cache=1 attribute gives huge speedup. This can cause
	trouble because of increased memory usage, but if the storage server is
	running in a VM, this seems to work fine.

	To enable cache when creating a target use:

	    create-target --cache target-name

	Here is example test with both configurations:

	cache      iops
	       read    write
	--------------------
	on     39014   16691
	off     3392    1456

	Tested with this fio configuration:

	$ cat mixed.fio
	[global]
	size=256m
	runtime=60
	time_based=1
	ioengine=libaio
	direct=1
	ramp_time=10

	[read]
	readwrite=randrw
	rwmixread=70
	rwmixwrite=30
	iodepth=64
	blocksize=4k

	$ fio mixed.fio --filename /dev/sdb

2020-11-11  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.36

2020-11-10  Nir Soffer  <nsoffer@redhat.com>

	nbd: Support dirty bitmaps
	Add bitmap argument to NBDServerConfig. When set, export the specified
	bitmap in qemu-nbd. NBD client can use the dirty bitmap to download
	changed blocks when a VM is powered off.

	The current code works for the simple case of a single volume, and can
	be used for initial testing with engine.

	Issues:

	- Support volume chain: create an overlay and merge the bitmaps from all
	  volumes to the overlay.

	- Handle errors like requesting bitmaps for raw volume, missing bitmaps
	  or invalid bitmap.

	- Using private nbd module from ovirt_imageio. This module should become
	  public in future imageio release.

2020-11-10  Ales Musil  <amusil@redhat.com>

	net: Rename and move py2to3 module to conversion_utils
	Rename the py2to3 module as python 2 is no longer
	supported. This way we can remove the compatibility
	code from this module and use it as regular util
	where it is needed.

2020-11-10  Bell Levin  <blevin@redhat.com>

	net: Replace updating of vfs with nmstate

	net, nmstate: Add update vfs nmstate option
	Simplifying the old way to set up the sriov through
	the sysfs, by replacing with nmstate. It is the easier and
	more common tool.

	Create update_num_vfs which can be called through the API
	to update the number of vfs.

2020-11-10  Marcin Sobczyk  <msobczyk@redhat.com>

	spec: Fix sudoers drop-in config permissions
	Sudoers drop-in config files should have 440 permissions -
	but let's make sure all of these are handled properly during packaging
	regardless of the permissions set on the actual file in git repo.

	Bug-Url: https://bugzilla.redhat.com/1895015

2020-11-10  Andrej Cernek  <acernek@redhat.com>

	net, tests: Unify container tests run

2020-11-09  Nir Soffer  <nsoffer@redhat.com>

	transientdisk: Support backing file
	During cold backup we need to create a temporary overlay file for
	merging bitmaps before the backup, and remove the overlay after the
	backup. The transientdisk module provide exactly what we need, except
	support for backing file. Add backing= and backing_format= arguments to
	support this use case.

	When creating an overlay, qemuimg.create() can use the backing file
	size, so change the size argument to optional.

	qemuimg: Report backing file format
	This is useful for tests and for code that want to inspect the backing
	chain. The value is returned only if the image was created with the -F
	argument. Creating images with backing files without specifying the
	backing format is deprecated and logs a warning. We need to clean up
	some test code using this shortcut.

	The new key "backingformat" is similar format to the "backingfile" key
	for consistency.

2020-11-09  Tomáš Golembiovský  <tgolembi@redhat.com>

	qga: handle new format of guest-get-devices
	The API for guest-get-devices changed before the QEMU 5.2 release.
	Date is now expressed as integer containing nanoseconds since epoch and
	'address' is now called 'id' and has flat structrure.

2020-11-03  Ales Musil  <amusil@redhat.com>

	vdsm-tool: Fix man formatting for vdsm-tool
	Due to some changes in the man file the
	rendered man page had some wrong indentations.

	vdsm-tool: Remove unused items from manual

2020-11-02  Tomáš Golembiovský  <tgolembi@redhat.com>

	vm: hide traceback from failed qemu-ga powercycle
	When we expect that the agent is not running don't log traceback from
	failed qemu-ga command. Log it only for DEBUG level. This prevents
	poluting logs with confusing tracebacks like:

	    Traceback (most recent call last):
	    File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 4792, in qemuGuestAgentShutdown
	        libvirt.VIR_DOMAIN_SHUTDOWN_GUEST_AGENT)
	    File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 101, in f
	        ret = attr(*args, **kwargs)
	    File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper
	        ret = f(*args, **kwargs)
	    File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 94, in wrapper
	        return func(inst, *args, **kwargs)
	    File "/usr/lib64/python3.6/site-packages/libvirt.py", line 2685, in shutdownFlags
	        if ret == -1: raise libvirtError ('virDomainShutdownFlags() failed', dom=self)
	    libvirt.libvirtError: Guest agent is not responding: QEMU guest agent is not connected

	vm: use proper logger for qemu-ga powercycle commands
	Use self.log instead of logging module.

	vmpowerdown: fix log message for qemu-ga reboot

2020-11-02  Amit Bawer  <abawer@redhat.com>

	tool: Handle template parent volumes in dump-volume-chains
	Previously we used StoargeDomain.getVolumes() which included also
	templates volumes in the response. Now we build the volumes list
	per image in the client, and we did not consider the case of
	template volumes.

	This is fixed by a two pass processing over the volumes result
	from the Storage domain dump API; on first pass we iterate
	the volume info results and group them into volumes_info dict
	by the image id of each volume. On second pass we iterate the
	resulted volume info volumes by images and add the parent volume
	info from the volumes result if it is missing in the volumes_info
	for that particular image, indicating it has a parent template
	volume from another image.

	Output before fix:

	   image:    a3749e70-1598-49d4-86a0-2d6ccb78a106

	             Error: no volume with a parent volume Id _BLANK_UUID found e.g: (a<-b), (b<-c)

	             Unordered volumes and children:

	             - 9fcaac13-652a-4d23-a5ff-7162ab3a87b7 <- ffa30cde-7346-437b-923c-2cfd4e3d6216
	               status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE, capacity: 107374182400, truesize: 1073741824

	Output after fix:

	   image:    a3749e70-1598-49d4-86a0-2d6ccb78a106

	             - 9fcaac13-652a-4d23-a5ff-7162ab3a87b7
	               status: OK, voltype: SHARED, format: COW, legality: LEGAL, type: SPARSE, capacity: 107374182400, truesize: 3758096384

	             - ffa30cde-7346-437b-923c-2cfd4e3d6216
	               status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE, capacity: 107374182400, truesize: 1073741824

	Base volume is shared from another image:

	    "9fcaac13-652a-4d23-a5ff-7162ab3a87b7": {
	        "apparentsize": 3758096384,
	        "capacity": 107374182400,
	        "ctime": 1596547254,
	        "description": "",
	        "disktype": "DATA",
	        "format": "COW",
	        "generation": 0,
	        "image": "45563a94-6498-4c42-ba00-8930a51c81ca",
	        "legality": "LEGAL",
	        "mdslot": 3,
	        "parent": "00000000-0000-0000-0000-000000000000",
	        "status": "OK",
	        "truesize": 3758096384,
	        "type": "SPARSE",
	        "voltype": "SHARED"
	    }

	Bug-Url: https://bugzilla.redhat.com/1839444

	tool: Normalize parent volume info in dump-volume-chains
	This step of pre-processing will help other steps in chain
	resolution relying on parent information.

	Bug-Url: https://bugzilla.redhat.com/1839444

2020-11-02  Ales Musil  <amusil@redhat.com>

	net: Remove ifacetracking module
	Ifacetracking was used by old dhclient and referenced
	from ifcfg. Both of mentioned are removed thus remove
	ifacetracking module.

	net: Clear link setup
	Remove SetupBonds because it is not used anymore.

	net: Remove nm module
	The direct access to NetworkManager is not
	used anymore. Remove the whole module and all
	references.

	net: Remove ifcfg related code
	Ifcfg module and directly related modules are not
	used anymore.

	net: Remove wait_for_ipv4 from vdsm init
	This script was waiting for getting IPv4 address
	on devices that were configured with vdsm ifcfg
	header. This is no longer the case as everything
	is configured through nmstate.

	net: Remove netupgrade module
	Since 4.4 every host has to be installed as fresh host.
	There is not an upgrade path from older hosts
	to the new ones using el8. Considering this fact,
	we no longer need network upgrade as the
	configuration created by the new host is up to date.

	net: Remove ifacquire module
	Ifacquire module is not used anymore and can be
	completely removed.

	net: Remove unused code from dns module

	net: Remove is_nmstate_backend function
	All references to the is_nmstate_backend are resolved.
	It is now safe to remove this function completely.

	net: Update cache and remove reference to is_nmstate_backend
	By removing is_nmstate_backend, reference to dns
	and dhclient module can be removed as well. To
	keep it compatible with the dhcp_info from dhclient
	we need to create default for every network before
	getting the real info.

	net: Remove legacy_switch module and its references

	net: Move legacy validation to validator
	The validation function was last part of the code that
	was used from legacy switch. After this move the whole module
	can be removed.

	net: Remove unused dhclient_monitor
	Remove dhclient_monitor and any related code
	that was calling this module.

	At the same time remove all the static files that
	were used by this monitor.

	net: Remove ifcfg reference from net-restore
	Ifcfg is not used anymore for network configuration
	so any reference to it from net-restore can be
	removed as well.

	net: Remove old OvS netrestore
	Remove the old ovs netrestore and all references
	from vdsm client. At the same time we can clean
	some other unused code from configurator.

2020-11-01  Nir Soffer  <nsoffer@redhat.com>

	tests: Add helper for creating volume
	Most nbd tests repeat verbose and uninteresting code to create a volume
	for exporting using nbd server. Add helper to create a single volume to
	clean up the tests.

	The new helper set qcow2_compat to "1.1"; previous tests always used
	"0.10" which is relevant only to exporting volumes to export domains.

	tests: Add nbd test helpers
	The nbd tests contains too much repeated code uploading image to nbd
	server, downloading image from nbd server, and comparing images. Add
	simple helpers cleaning up the tests.

	tests: Add id for discard parameter
	Show "discard" and "no_discard" instead of True and False in the test
	name. This makes it easier to work with the tests.

2020-10-29  Nir Soffer  <nsoffer@redhat.com>

	constants: Clean up type2name, name2type
	First replace direct access of sc.VOLUME_TYPES with sc.type2name(), and
	make the internal dict private.

	type2name() had wrong error handling, returning None on IndexError. But
	IndexError is never raised when accessing dict, so we know that current
	code works with KeyError. Remove error handling so accessing with bad
	type will raise.

	name2type() had correct error handling, but looking at the few places in
	the code using it, we really want to fail loudly with KeyError and avoid
	returning None.

	Simplify the implementation of name2type() to convert name to uppercase
	only once (instead of once per value) and do single dict lookup instead
	of linear search.

	With these changes we can use readable parameters like "sparse" and
	"preallocated" in the nbd tests.

	constants: Add str2fmt helper
	First replace direct access of sc.FMT2STR dict with the existing helper
	sc.fmt2str(), and make the dict private.

	Add a str2fmt() helper to convert format name (e.g. "qcow2") to the
	internal format number. This make the tests easier to work with.

	With these changes we can use readable format names like "raw" and
	"qcow2" in the nbd tests.

	tests: Remove unused monkeypatch
	The fixture does not used monkeypatch, remove it.

	nbd: Fix compatibility with older engines
	When backing_chain support was added in:

	commit 372ffa7f0b6343fa86fe17163f9062d4a5edbe67
	Author: Nir Soffer <nsoffer@redhat.com>
	Date:   Wed Sep 16 12:37:58 2020 +0300

	    nbd: Allow exporting a volume without the backing chain

	backing_chain property was added with default=True:

	    backing_chain = properties.Boolean(default=True)

	This works fine if the backing_chain property is not initialized, but
	when the caller does not specify the value, it was initialized to None:

	    self.backing_chain = config.get("backing_chain")

	Setting property to None overrides the default value, so the property
	became None, which is treated as False later, leading to wrong NBD
	export returning only the specified volume instead of the entire chain.

	This breaks older engine that do not specify the backing_chain argument.

	Fixed by treating unspecified value as True. This may be fixed in the
	properties module, but this requires much more work and thinking, and we
	need to fix this requestion quickly.

	Bug-Url: https://bugzilla.redhat.com/1892403

	tests: Test NBD.start_server backing chain support
	This feature was added in 4.4.3 without a test. Add the missing test
	before we change the server for supporting cold backup.

	Add 2 tests for exporting an image chain. One exporting the entire
	chain, and one using backing_chain=False, exporting only single volume
	from the chain.

	The new tests reveal a bug with unspecified backing_chain; The value
	becomes None, and None is treated as False. This does not affect engine
	4.4.3, since it always specify the backing_chain, but older engine that
	do not specify the new parameter will get incorrect NBD export.

	Bug-Url: https://bugzilla.redhat.com/1892403

	tests: Modernize nbd tests
	Use qcow2 for src and dst images, so differences in file systems cannot
	affect the tests. Our test pipeline is now:

	    src (qcow2) -> volume (qcow2 or raw) -> dst (qcow2)

	To write data to the source image, use qemu-io instead of writing data
	manually.

	To compare downloaded data, use qemu-img compare. This needs less code
	and support comparing allocation which was not tested before.

	Increase the test image size to 10 MiB, since recent qemu-img configure
	XFS to use 1 MiB extent size. Testing with bigger size ensure that we
	test also unallocated areas.

	Bug-Url: https://bugzilla.redhat.com/1892403

2020-10-27  Nir Soffer  <nsoffer@redhat.com>

	qemuimg: Fix copy to raw preallocated volume
	In qemu-img 4.2.0-29 and 5.1.0, converting an image with the -n to
	preallocated file makes the file sparse. We cannot fix this with
	qemu-img 4.2, but in 5.1.0  we can use the new --target-is-zero. With
	this option qemu skips unallocated areas during covert, so the target
	image remains preallocated.

	Add Volume.zero_initialized() class method, returning True if new volume
	is always zeroed. This is always true for file based volume and always
	false for block based volumes.

	Add new target_is_zero to qemuimg.convert(). When qemu-img supports
	--target-is-zero, and the image does not have a backing file, this adds
	--target-is-zero option.

	Since we still support old CentOS with qemu-img 4.2, add
	target_is_zero_supported() helper, and xfail_requires_target_is_zero()
	marker. The tests use this marker to mark the relevant test cases as
	expected failure when running the tests with older qemu-img version.

	When converting images in copy_data job and image.py, use target_is_zero
	for zero initialized destination volume.

	Add new test for converting to raw preallocated image, ensuring that the
	target image remain preallocated after converting.

	Using --target-is-zero fixes also the issue of unwanted allocation when
	the target is a qcow2 image. The existing tests can use again strict
	compare.

	The failing tests for copy_data are fixed now, except the variants using
	fake "block" volumes. Since these tests are useless I dropped them.

	Bug-Url: https://bugzilla.redhat.com/1891520

2020-10-27  Benny Zlotnik  <bzlotnik@redhat.com>

	storage: add prepared paramter to CopyDataEndpoint
	Since 4.4 LSM uses copy_data as well and because of this prepare and
	teardown are unnecessary (and teardown will fail) when it is executed on
	the same host running the VM.

	prepared=True will be sent by the engine when the volume is expected
	to be already prepared on the host running the command.

	Bug-Url: https://bugzilla.redhat.com/1833780

2020-10-27  Ales Musil  <amusil@redhat.com>

	automation: Fix vdsm container for Travis
	The integration tests require new dependency which
	was not included in the travis container image.

2020-10-26  Nir Soffer  <nsoffer@redhat.com>

	tests: Require qemu-img
	The tests use qemu-img but the package was not required. We probably got
	it by luck via the python3-libvirt requirement.

	Specify exact version in the dockerfile to avoid caching issues in
	quay.io.

	tests: Require ovirt-imageio-client 2.1.1
	We want to use client.extents() in the tests, introduced in 2.1.1.

	Looks like we have a caching issue in quay.io, and we never get new
	releases. Mitigate this by specifying exact package version.

	To ensure that we always test with the latest imageio build, add the
	ovirt-imageio-preview repo.

	tests: Mark failing copy_data tests
	Seems that recent changes in qemu-img after 4.2.0-19 cause the target
	image to be sparse when using -n option.

	Here is an example flow:

	$ qemu-img create -f raw -o preallocation=falloc src.raw 1m
	Formatting 'src.raw', fmt=raw size=1048576 preallocation=falloc

	$ qemu-img create -f raw -o preallocation=falloc dst.raw 1m
	Formatting 'dst.raw', fmt=raw size=1048576 preallocation=falloc

	$ qemu-io -f raw -c 'write -P 240 0 65536' src.raw
	wrote 65536/65536 bytes at offset 0
	64 KiB, 1 ops; 00.06 sec (1005.353 KiB/sec and 15.7086 ops/sec)

	$ ls -lhs
	total 2.0M
	1.0M -rw-r--r--. 1 nsoffer nsoffer 1.0M Oct 25 19:49 dst.raw
	1.0M -rw-r--r--. 1 nsoffer nsoffer 1.0M Oct 25 19:49 src.raw

	$ qemu-img convert -t none -T none -f raw -O raw -n src.raw dst.raw

	$ ls -lhs
	total 1.1M
	 64K -rw-r--r--. 1 nsoffer nsoffer 1.0M Oct 25 19:46 dst.raw
	1.0M -rw-r--r--. 1 nsoffer nsoffer 1.0M Oct 25 19:46 src.raw

	This is a real issue in the system we need to fix, but we need to fix
	the build first. Add a new marker for this issue, and mark the affected
	tests as expected failures.

	Bug-Url: https://bugzilla.redhat.com/1891520

	tests: Fix convert test with qemu-img 5.1.0
	Looks like qemu-img convert zero the destination image when it does not
	have a backing file, instead of leaving the unallocated areas
	unallocated.

	Here is an example showing the issue:

	Create source and destination images:

	$ qemu-img create -f qcow2 src.qcow2 10m
	Formatting 'src.qcow2', fmt=qcow2 cluster_size=65536
	compression_type=zlib size=10485760 lazy_refcounts=off
	refcount_bits=16

	$ qemu-img create -f qcow2 dst.qcow2 10m
	Formatting 'dst.qcow2', fmt=qcow2 cluster_size=65536
	compression_type=zlib size=10485760 lazy_refcounts=off
	refcount_bits=16

	Write data to source:

	$ qemu-io -f qcow2 -c 'write -P 240 0 65536' src.qcow2
	wrote 65536/65536 bytes at offset 0
	64 KiB, 1 ops; 00.01 sec (7.355 MiB/sec and 117.6809 ops/sec)

	Compare images:

	$ qemu-img compare src.qcow2 dst.qcow2
	Images are identical.

	$ qemu-img compare src.qcow2 dst.qcow2 -s
	Strict mode: Offset 65536 block status mismatch!

	Using imageio client, we can see that the image allocation is different:

	$ python
	>>> from ovirt_imageio import client
	>>> from pprint import pprint
	>>> pprint(list(client.extents("src.qcow2")))
	[ZeroExtent(start=0, length=65536, zero=False, hole=False),
	 ZeroExtent(start=65536, length=10420224, zero=True, hole=True)]
	>>> pprint(list(client.extents("dst.qcow2")))
	[ZeroExtent(start=0, length=65536, zero=False, hole=False),
	 ZeroExtent(start=65536, length=10420224, zero=True, hole=False)]

	The second extent is a hole in the source image, but not a hole in the
	destination image.

	Change the tests to use non-strict compare when converting an image
	without a backing chain, and use strict compare only when converting to
	destination image with a backing file.

	When comparing the top volume, remove the backing file so we can compare
	only the top volume. This allows strict compare for the top volume.

	Since copy collapsed test does not assume an empty image now, write data
	to source to make sure we convert the images correctly.

	tests: Fix sparse image allocation check on XFS
	Older qemu-img always allocate the first file system block (4096 bytes).
	Newer qemu-img configure XFS to use 1 MiB extents, so the minimal
	allocation is 1 MiB. Change the test to work with both versions.

	tests: Fix untrusted image verification tests
	Older versions of qemu-img did not access the backing file when creating
	a new image if image size was specified. Current version (5.1.0) tries to
	access the backing file and fails with:

	    Could not open '/path/to/file': No such file or directory
	    Could not open backing image.

	Change the tests to create the backing file, and use "qemu-img rebase"
	to change the backing file instead of "qemu-img create".

	tests: Fix check for qemu-img bitmap support
	qemu-img bitmap is available in qemu-img 4.2.0-29 on CentOS 8.2, but
	--merge works only in RHEL 8.3.0 nightly (qemu img 5.1.0). Since we use
	bitmaps operations only in in cluster version 4.5, requiring libvirt
	6.6 (requires qemu-img 5.1), it is simpler to allow bitmaps operation
	only with qemu-img 5.1.

	To make our life more interesting, --merge does not work in qemu-img
	5.1.0 on Fedora, breaking the tests when running locally.

	Change the bitmaps_supported() check to use qemu-img version, and add a
	new @requires_bitmaps_merge_support marker to skip merge tests on
	Fedora. Tests for merging bitmaps use the new marker.

	We will be able to eliminate these checks when CentOS 8.3 will be
	released, and qemu-img 5.2 will be available in Fedora.

2020-10-26  Ales Musil  <amusil@redhat.com>

	net, tests: Clear container depndencies
	Remove dependencies that are specifically required by the
	vdsm spec file. Because we are installing the vdsm-network
	package all those dependencies will be installed as well.

2020-10-22  Ales Musil  <amusil@redhat.com>

	net, service: Update network-service requirements
	network-service should depend on NetworkManager and
	openvswitch services. Because we are using purely
	nmstate to configure networks, it is required to have
	NetworkManager running before we attempt to do any
	configuration.

	net, spec: Require OvS package for vdsm-network
	OvS package was not required on platforms like ppc.
	Now it should be built for ppc as well.

2020-10-21  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.35

2020-10-20  Eyal Shenitzky  <eshenitz@redhat.com>

	sdm/api/merge.py: merge bitmaps in merge job
	If a snapshot contains QEMU bitmaps and it's been removed,
	the bitmaps will not be merged to the base volume, which means
	that the next backup operation on the base volume will fail.

	This patch adds the logic to merge bitmaps from the removed top
	volume to the base volume during SDM merge, bitmaps will be merged
	only if the base volume is in 'qcow2' format and the engine
	requests to merge the bitmaps (merge_bitmaps=True).

	Bug-Url: https://bugzilla.redhat.com/1861673

2020-10-20  Marcin Sobczyk  <msobczyk@redhat.com>

	Revert "tool: libvirt: Stop libvirt sockets on reconfiguration"
	This reverts commit f85b3fe76a2b7e6a0fdbe7ba504d17d251cf01bd.

	We need to address the blocking bug on virt-who side first [1].

	[1] https://bugzilla.redhat.com/show_bug.cgi?id=1889363

2020-10-20  Vojtech Juranek  <vjuranek@redhat.com>

	tests: don't fake VM sync_metadata() method
	In folow-up patches we need to work with metadata and there's no need to
	fake this method. Remove it's overwrite from fake VM.

2020-10-20  Eli Mesika  <emesika@redhat.com>

	use fence agents without telnet dependency on RHEL
	Turn dependency on fence-agents package to be conditional
	For RHEL 8.x it will use fence-agents package that does not depend on
	telnet.
	For other platforms/versions it will use the old fence-agents package
	that depends on telnet

	Bug-Url: https://bugzilla.redhat.com/1729222
	Bug-Url: https://bugzilla.redhat.com/1835650

2020-10-20  Marcin Sobczyk  <msobczyk@redhat.com>

	yajsonrpc: Don't log fast, successfull RPC calls
	According to the bug reporter, "RPC call ... succeeded in ..."
	messages can take up to almost 20% of vdsm log. Since calls that
	end successfully and take less than 1.0s are rarely interesting
	this patch disables logging the information about their status.

	Bug-Url: https://bugzilla.redhat.com/1751520

2020-10-19  Andrej Cernek  <acernek@redhat.com>

	net, tests: remove nmstate backend check
	Since we are now only using nmstate backend in network tests, we no
	longer need to check whether we run it, nor do we need to address
	situation when nmstate is the backend.

	net, tests: use nmstate in integration tests
	Since the ability to not use nmstate as backend is removed, we should
	run integration tests on nmstate as well. On CI we need to install
	NetworManager and nmstate, at least until the suite will be run in
	container.

	net, tests: move CI check to common lib

2020-10-19  Ales Musil  <amusil@redhat.com>

	net: Move netswitch util to new common module
	The common module should house everything that is isolated
	enough to not depend on any other code except something from
	common.

	This should prevent nmstate circular dependency issues that
	were not detected before.

2020-10-17  Dan Kenigsberg  <danken@redhat.com>

	network.link.bridge: rename variable
	BR_KEY_BLACKLIST holds the bridge options we do not want to read or
	write. It is not particularly black, not it is a list - it is actually a
	tuple. This patch renames it to SKIPPED_BRIDGE_OPTIONS.

2020-10-15  Tomáš Golembiovský  <tgolembi@redhat.com>

	qga: consume guest-get-disks command
	We can now get the list of all disks with new guest-get-disks command.
	This includes also disks without mounted filesystem. Since this is a new
	feature we don't want to rely solely on this new command as it is likely
	it won't be supported by most of the guests in the foreseeable future.
	We try to fetch info first using this command, if this is successful
	this information is kept. Otherwise we resort to the previous way and
	gather the information from data of guest-get-fsinfo.

	Bug-Url: https://bugzilla.redhat.com/1836661

2020-10-15  Amit Bawer  <abawer@redhat.com>

	vmfakelib: Add IRS methods for live merge testing
	Add volume info, teardown and image sync volumes chain methods.

	Bug-Url: https://bugzilla.redhat.com/1796415
	Bug-Url: https://bugzilla.redhat.com/1796124

2020-10-15  Dan Kenigsberg  <danken@redhat.com>

	storage.hsm: rename whiteList variable and function
	The purpose of "whiteList" in __cleanStorageRepository() is not apparent
	by its name. In fact, it is a list of glob patterns of files we should
	keep as they are. This patch renames the whiteList to KEEP_PATTERNS, to
	make it more apparent that it is actually a constant listing globs of
	files we want to keep on disk. Similarly, the patch renames the function
	that checks if a file is covered by the globs and thus should be kept.

	__cleanStorageRepository() begs for a more thorough refactoring. E.g, I
	see no reason to modify "dirs" while iterating over it. However, this
	patch intentionally limits itself to name changes only.

2020-10-15  Eyal Shenitzky  <eshenitz@redhat.com>

	bitmaps.py: extract add bitmap logic to a function
	Add bitmap logic extracted to a function to re-use in
	a later patch.

	Also, added a function to query and filter bitmap for volume.

	Bug-Url: https://bugzilla.redhat.com/1861673

2020-10-14  Shani Leviim  <sleviim@redhat.com>

	API: Introduce switchMaster verb
	The StoragePool.switchMaster operation will be used to switch the
	master role manually to the selected storage domain.

	Bug-Url: https://bugzilla.redhat.com/1576923

2020-10-14  Amit Bawer  <abawer@redhat.com>

	API: Add teardownVolume to public api
	This fixes an old TODO for an API call for tearing down
	the top volume after live merge is done and will make
	testing easier.

	Bug-Url: https://bugzilla.redhat.com/1796415
	Bug-Url: https://bugzilla.redhat.com/1796124

2020-10-14  Eyal Shenitzky  <eshenitz@redhat.com>

	backup.py: Use 'backup_mode' property for each disk in the backup
	There are two supported backup modes:

	  - 'full':
	        should be used for 'raw' disks and for
	        newly added disks that weren't part of a
	        previous checkpoint.

	  - 'incremental':
	        should be used for disks that were
	        part of a previous backup, this mode
	        can be used only when taking an
	        incremental backup.

	This patch uses the newly reported backup_mode that the engine provides.
	In order to have backward compatibility, if the backup mode doesn't
	reported by the engine, the backup XML will not include it and the
	backup will not be able to contain both 'full' and 'incremental' backups
	under the same checkpoint.

	Bug-Url: https://bugzilla.redhat.com/1861674

	backup.py: add backup_mode property for each backup disk
	Add 'backup_mode' = 'full'/'incremental' for each disk that
	participates in a VM backup.
	This information will guide VDSM on the type of backup that should
	be taken for each disk.

	This new property will give the option to create a checkpoint that
	contains both 'full' and 'incremental' backup disks.

	So now the request data from Engine to VDSM contains the following
	structure:
	{
	    backup_id: "backup-1",
	    disks: [
	        {
	            "img_id": "disk1",,
	            "domain_id": "domain1",
	            "volume_id": "volume1",
	            "checkpoint": false,
	            "backup_mode": "full",
	        },
	        {
	            "img_id": "disk2",,
	            "domain_id": "domain2",
	            "volume_id": "volume2",
	            "checkpoint": true,
	            "backup_mode": "incremental",

	        },
	    from_checkpoint_id: "from_checkpoint_id",
	    to_checkpoint_id: "to_checkpoint_id"
	}

	The 'backup_mode' property will be used in the followup patch.

	Bug-Url: https://bugzilla.redhat.com/1861674

2020-10-14  Vojtech Juranek  <vjuranek@redhat.com>

	tests: add fake prepare and tear down volume methods
	Add fake prepareImage() and teardownImage() methods which will be later
	on used for testing new flows for changing CDs located on block SD.

	Add also test for changing CD on block device. This flow is currently
	not used as CD is always specified as path but not as PDIV. This is
	now mainly to test newly added fake methods.

	tests: move vmfakelib into virt package
	vmfakelib module is mostly related to virt stuff and should be moved
	there.

	tests: make vmfakelib_test tests simple functions
	Remove test class from vmfakelib_test and make tests simple functions.

	tests: convert vmfakelib_test to pytest

2020-10-14  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.34

2020-10-13  Eyal Shenitzky  <eshenitz@redhat.com>

	API.py: Add new merge_bitmaps flag to SDM merge() verb
	Now, qemu-img supports the 'bitmaps' sub-command to perform
	operations on bitmaps that were created during a backup operation.

	A new flag added to the SDM merge() verb, this flag will instruct
	the host if all the bitmaps from the top volume should be merged
	to the base volume.

	This patch only adds the API for merging the bitmaps when a
	snapshot is removed when the VM is down.
	The following patches will implement merging the bitmaps for
	file and block volumes.

	Bug-Url: https://bugzilla.redhat.com/1861673

	qemuimg.py: add support for enable/disable a bitmap
	Add support for enable/disable a bitmap.

	Disable a bitmap will remove the "AUTO" flag from the bitmap flags,
	enabling the bitmap will add the "AUTO" flag to the bitmap flags.

	Bug-Url: https://bugzilla.redhat.com/1861673

2020-10-13  Nir Soffer  <nsoffer@redhat.com>

	properties: Allow None default value for Enum
	If the Enum is not set, the value will be None. Previously the only way
	to do this was to add None to the valid values.

	properties: Convert tests to pytest

2020-10-13  Ales Musil  <amusil@redhat.com>

	net, service: Remove vdsm-network-init service
	Purpose of this service was to restore IP and link
	configuration of OvS networks. This is not needed anymore
	as the regular net restoration that we have works universally
	with linux bridge and ovs.

	Bug-Url: https://bugzilla.redhat.com/1809102

	net, service: Update vdsm-network service unit
	By moving dumping of bond option to vdsm-network
	we can remove dependency on vdsm-network-init.

	vdsm-network-init was used to restore legacy ovs
	networking. This is not needed anymore, because
	net-restore is capable of restoring both linux
	bridge and ovs as they are both using nmstate.

	Bug-Url: https://bugzilla.redhat.com/1809102

2020-10-12  Nir Soffer  <nsoffer@redhat.com>

	clientIF: Fix reactor life cycle
	Previously we created the reactor when initializing clientIF, and
	started the reactor thread in clientIF.start(). However, the reactor was
	closed from the acceptor. This works since we stop both the acceptor and
	reactor during shutdown, but it is wrong design, leftover from the time
	the reactor was created by the acceptor.

	Things should be started and stopped in the same place. If clientIF is
	starting the reactor, it should also stop it. The reactor cannot depend
	on the acceptor using it.

	This patch fixes the wrong dependency by stopping the reactor in
	clientIF.prepareForShutdown.

2020-10-12  Dan Kenigsberg  <danken@redhat.com>

	bridge_test: rename apiWhitelist
	apiWhitelist is badly named. It is in camel case; it does not show that
	it is a local variable; it does not show that it is constant; it claims
	to be a list, while it is actually a tuple; it is not clear what is its
	purpose.

	This patch renames it to _COPIED_API_OBJECTS to fix the above issues.

2020-10-12  Dominik Holler  <dholler@redhat.com>

	virt, net: support isolated port in updateDevice
	This patch adds an optional parameter to control port
	isolation on updates of plugged virtual NICs.

	Bug-Url: https://bugzilla.redhat.com/1725166

2020-10-11  Vojtech Juranek  <vjuranek@redhat.com>

	tests: add real clientIF into cd_test
	Not have to copy&paste parts of real clientIF into fake one, add another
	implementation into cd_test module which inherits from real clientIF and
	only overwrites constructor to use fake IRS.

	Also use real clientIF in cd_test module.

2020-10-11  Dan Kenigsberg  <danken@redhat.com>

	storage.misc.walk: name skipped paths appropriately
	misc.walk() is asked to os.walk(), but never get into particular paths
	that the caller would like to avoid. These unwanted paths do not have a
	particularly low albedo value; they are not really black. And they do
	not necessarily come as a list. So let us name the argument for what it
	is: a bunch of stuff we want to skip.

2020-10-07  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.33

2020-10-07  Ales Musil  <amusil@redhat.com>

	net, ovs: Enforce MAC of base interface
	When vdsm takes over existing interface it should ensure
	the MAC address is cloned to the OvS interface. This way
	the IP address obtained from DHCP server will remain
	the same as it was configured on the base interface.

	This was causing issues mostly for host deploy flows.
	Engine was not able to communicate with host because
	the ip address has changed.

	Only exception that is not covered is when we are creating
	bond and network on top of that bond in the same request.
	This scenario is complicated by the nature of some bond
	modes and it is hard to reliably determine which MAC address
	will be assigned to the newly created bond.

	Bug-Url: https://bugzilla.redhat.com/1809102

2020-10-06  Eyal Shenitzky  <eshenitz@redhat.com>

	qemuimg.py: add support for 'qemu-img bitmap --merge' command
	Add new 'qemu-img bitmap --merge' command.

	This operation is needed to merge bitmaps
	during a cold merge operations.

	Bug-Url: https://bugzilla.redhat.com/1861673

	copy_data.py: support copy bitmaps when copy volume data
	Add support for copy all the bitmaps from source volume to the
	destination volume using '--bitmaps' flag in 'qemu-img convert'.

	Bug-Url: https://bugzilla.redhat.com/1861671

	spec: Update qemu-kvm requirement for RHEL AV 8.3
	Update qemu-kvm requirement to consume the fix for
	bug 1877209 in RHEL-8.3.

	Bug-Url: https://bugzilla.redhat.com/1861673

2020-10-06  Nir Soffer  <nsoffer@redhat.com>

	resourcemanager: Don't repeat module name in class name
	Using rm.ResourceManagerLock() when creating lock lists is too verbose
	and annoying to work with. Rename to Lock() since this object implements
	a lock interface. Users of this object should not care about the type,
	only about its behaviour.

	sp: Cleanup logging in masterMigrate
	Cleaner and more consistent logging, to make it easier to debug this
	area for supporting online switching of the master domain.

	Bug-Url: https://bugzilla.redhat.com/1080097

2020-10-06  Vojtech Juranek  <vjuranek@redhat.com>

	tests: move tests related to CD into separate module
	Test module for vm module is huge. Move tests related to CDROM into
	separate module and convert them into pytest.

	Bug-Url: https://bugzilla.redhat.com/1589763

2020-10-06  Marcin Sobczyk  <msobczyk@redhat.com>

	tool: libvirt: Stop libvirt sockets on reconfiguration
	When reconfiguring libvirtd with vdsm-tool, we stop the
	'libvirtd.service'. This however is not enough - since libvirt
	switched to socket activation, even when the main 'libvirtd.service'
	daemon is not running, one can respawn it by making a connection
	to one of its sockets.

	This patch adds libvirt's sockets to the list of units that are being
	stopped and started during reconfiguration so we can avoid this issue.

	Bug-Url: https://bugzilla.redhat.com/1878724

	ci: Drop fc30
	Since we officially dropped support for Fedora there's no reason
	to have these fc30 pipelines and dockerfiles any longer.

	The remaining fc30 pipelines are networking functional tests which
	require docker/podman. Since we don't yet have a solution for
	this on el8, we're keeping them for now.

	ci: Move linters to el8
	Fedora 30 hasn't been supported since quite some time -
	this patch moves the linters CI substage to use el8 instead.

2020-10-06  Andrej Cernek  <acernek@redhat.com>

	net: remove deprecated net_persistence option
	ifcfg option of net_persistence was deprecated since 4.1,
	so we can remove this option altogether now.

2020-10-05  Milan Zamazal  <mzamazal@redhat.com>

	virt: Fix migration destination detection in recovery
	When VM status recovery detects a paused VM in migration, it assumes
	that it's an outgoing migration.  But it can be also an incoming
	migration, which is actually much more likely, because a VM is paused
	only for a short time (unless it's a post-copy migration, which is
	handled at a different place) on the source while it's paused for long
	time on the destination.

	Let's distinguish between source and destination by examining the
	migration job, which is present only on the source.  The implemented
	solution is still racy and cause trouble, but only under the following
	circumstances (all the conditions must happen):

	- Vdsm gets restarted on the migration source during a migration.
	- The recovery is invoked during the short time the VM is paused for
	  the final stage of a pre-copy migration.
	- The migration fails.
	- The migration job disappears between checking the VM status and the
	  immediately following job check.

	This scenario is very unlikely and can be remedied by Vdsm restart if
	it ever happens, so it's probably not worth of complicating the code.

	Bug-Url: https://bugzilla.redhat.com/1877632

2020-10-05  Tomáš Golembiovský  <tgolembi@redhat.com>

	qga: bring the VM up in engine after boot
	Notify engine that the VM is up once we successfully contact qemu-ga.

	Bug-Url: https://bugzilla.redhat.com/1834233

	qga: invalidate qemu-ga capabilities on reboot
	This lets us pick up new capabilities after the reboot. In most cases
	the capabilities will be the same, but this way we can handle unusual
	situations. For example when the agent is updated shortly before reboot
	or VMs with multi-boot.

	It will also allow us to detect the UP state properly in the
	following patch.

	Bug-Url: https://bugzilla.redhat.com/1834233

2020-10-05  Dan Kenigsberg  <danken@redhat.com>

	misc_test: test misc.walk()

2020-10-05  Tomáš Golembiovský  <tgolembi@redhat.com>

	qga: mark guest-info check from single location
	Instead of storing the information in every place where we call
	guest-info command it is better to do it only once from the
	_qga_capability_check() method. The patch changes the original behavior
	slightly -- in _on_boot() function the information is now stored even in
	case of failure. This is in fact correct. The old behavior was wrong and
	did not match the rest of the poller.

	Bug-Url: https://bugzilla.redhat.com/1834233

	qga: add is_active method
	Add a method indicating if qemu-ga is active in the guest. We don't have
	an accurate way of knowing if the agent is responsive but this at least
	tells us if the agent responded last time we queried it.

	Bug-Url: https://bugzilla.redhat.com/1834233

	qga: treat None in capabilities same in all places
	Caller of get_caps() and update_caps() should be able to pass None
	instead of dictionary with capabilities and it should be treated equall
	to the already initialized dictionary that we use to represent "no
	capabilities" (or "no qemu-ga seen").

	Bug-Url: https://bugzilla.redhat.com/1834233

2020-10-05  Milan Zamazal  <mzamazal@redhat.com>

	mkimage: Don't fail on old floppy payload paths
	In commit cbad9d144d838f11003cf86621e60c9129120e6a, Vdsm started using
	/run instead of /var/run.  However, that directory is used also for
	payload files and we must honor the old location on incoming
	migrations.

	We have ensured in commits 67531d70166ae13a430703972ae499e7a0888c7a
	and dbe4fd2cc67bccb40a2699717341b774f3b53bc9 that payload paths are
	transferred correctly on migrations from/to Vdsm < 4.4.  We just need
	to address the change in the check of a floppy payload path.

	Migrations with a floppy payload will still fail when migrating from
	a host >= 4.4.3 to a 4.4 <= host < 4.4.3.  This can be remedied by
	updating the older 4.4 hosts.

	Bug-Url: https://bugzilla.redhat.com/1883446

2020-10-05  Liran Rotenberg  <lrotenbe@redhat.com>

	virt: fix DIMM removal
	In the case of DIMM removal, it is not saved in VDSM. Libvirt removes
	the device, but since we don't save it, it will result in a warning
	message and without updating the engine.

	This patch will save the DIMM device and on device removal event will
	check it. It will trigger the update to the engine and won't show a
	false warning. It will take care to update the engine in cases of device
	removal that didn't execute by VDSM or when VDSM restarted in the
	middle of the action.

	Bug-Url: https://bugzilla.redhat.com/1883483

2020-10-05  Ales Musil  <amusil@redhat.com>

	net, tests: Enable additional OvS functional tests
	Enable tests that should work out of the box without
	any adjustments.

	net: Unify legacy and ovs vlan report in netinfo
	Currently ovs would report vlanid in network netinfo.
	In order to unify the format and data reported, the
	legacy will report the vlanid property as well.

	net, ovs: Manage DNS for OvS networks
	Add code to generate and manage DNS state for
	OvS networks.

	Bug-Url: https://bugzilla.redhat.com/1809102

	net, nmstate: Move DNS config to its own module
	In order to reuse the code between linux bridge and
	ovs, move the DNS code to its own module.

	Bug-Url: https://bugzilla.redhat.com/1809102

	net, ovs: Manage routes for OvS networks
	Add code to generate and manage route state for
	OvS networks.

	Bug-Url: https://bugzilla.redhat.com/1809102

	net, nmstate: Add support for OvS to route module
	Getting next hop interface was not aligned with OvS.
	Add check for OvS and return corresponding interface.

	Bug-Url: https://bugzilla.redhat.com/1809102

	net, tests: Enable switch type change tests
	Switch type change should be possible for basic scenarios.
	Enable functional tests to test all provided scenarios.

	Bug-Url: https://bugzilla.redhat.com/1809102

	net, ovs: Add source routing support
	Add source route support for OvS. This can be
	achieved easily as the interfaces are managed by
	NetworkManager.

2020-10-05  Andrej Cernek  <acernek@redhat.com>

	net: remove unused code from restore net config

2020-10-04  Dan Kenigsberg  <danken@redhat.com>

	storage.hsm: fix typo

2020-10-02  Benny Zlotnik  <bzlotnik@redhat.com>

	tests: add tests for measure without backing file
	Bug-Url: https://bugzilla.redhat.com/1826365

	kvm2ovirt: move is_block_device to fileUtils
	Bug-Url: https://bugzilla.redhat.com/1826365

	storage: support measuring without backing file
	By introducing the new `withBacking` argument to Volume.measure it will
	now be possible to measure internal volumes and getting their required
	size only for them, and not the size of the entire chain including the
	backing files.

	This would relieve us of having to "guess" the required size in the
	engine.

	Measure on a file Storage Domain:
	$ qemu-img measure --output json -O raw 'json:{"file": {"driver": "file", "filename": "/rhev/data-center/mnt/nfs:_root_storage__domains_sd1/bf5d4320-191d-4551-9c21-94745d1c6ec1/images/ef282a06-251b-4d17-a587-9e3337be6df2/0bae1526-f18f-4ede-98e8-2ec7c3e97f9e"}, "driver": "qcow2", "backing": null}'

	Measure on a block Storage Domain:
	$ qemu-img measure --output json -O raw 'json:{"file": {"driver": "host_device","filename": "/rhev/data-center/mnt/blockSD/9090d51a-fa55-478d-b77a-5ef07470ff3f/images/1d390146-6812-4405-9b40-ec7e7fe88d4d/4ee5265e-c0b5-4064-9ac5-b806fb3a0824"}, "driver":"raw"}'

	Bug-Url: https://bugzilla.redhat.com/1826365

2020-10-01  Ales Musil  <amusil@redhat.com>

	net, tests: Mark another unstable functional test
	test_create_network_over_an_existing_unowned_bridge is
	failing way too often on CI with mainloop error.

2020-10-01  Kaustav Majumder  <kmajumde@redhat.com>

	gluster: Fixed py3 type errors in gluster/hooks.py
	While fetching a gluster hook content the file should be
	read in binary mode since it expects bytes like obj and not str.
	Also added a test for the same.

	Bug-Url: https://bugzilla.redhat.com/1858230

2020-09-30  Germano Veit Michel  <germano@redhat.com>

	dump-volume-chains: use storage.dump() api
	This tool can make good use of the new dumpStorageDomain() API.
	It will speed up data collection and lower the system overhead
	when doing it by using a single API instead of nested loops
	with image and volume related APIs.

	It will help not hitting the sos default timeout of 300s
	when running such commands, which can cause incomplete
	sosreports.

	Bug-Url: https://bugzilla.redhat.com/1839444

2020-09-30  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.32

2020-09-29  Vojtech Juranek  <vjuranek@redhat.com>

	lvm: add comment why we use -f with vgremove

	lvm: log when removing VG
	Add info log when removing VG.

	lvm: deactivate VG before removing it
	Before we remove VG, we should deactivate it. Deactivate will invalidate
	all VG LVs and also will clean up VG mapping is there are any issues,
	thus we make sure we don't left any stale LV links after VG removal.

	lvm: make removeVgMapping function private

	storage: cleanup DM mappings during VG deactivation
	It can happen (e.g.when the storage is not available), that the DM
	mappings for LVs are not removed by LVM (see [1]), and after block SD
	teardown there are still some stale DM links. Devices can be later use
	for another storage and stale links pointing to these devices can lead
	data corruption.

	Clean DM mappings manually upon VG deactivation to make sure there are
	not stale DM mappings.

	[1] https://bugzilla.redhat.com/1881468

	Bug-Url: https://bugzilla.redhat.com/1863058

	storage: inline lvm._setVgAvailability() function
	The only user of lvm._setVgAvailability() function is lvm.deactivateVG()
	function. Inline _setVgAvailability() into this function and adjust it
	for deactivation of the VG.

2020-09-29  Eyal Shenitzky  <eshenitz@redhat.com>

	sd.py: validate domain version for bitmaps operations
	Bitmaps operations cannot take place on storage domain versions < 4.

	Bitmaps operations need 1.1 qcow2 version and v3 storage domains the
	qcow2 version is 0.10.

2020-09-29  Nir Soffer  <nsoffer@redhat.com>

	sd: Change validateCreateVolumeParams to method
	StorageDomainManifest.validateCreateVolumeParams() was changed to a
	class method in:

	commit a28018598ac408b741b9f09ace3adea09c132d2a
	Author: Denis Chaplygin <dchaplyg@redhat.com>
	Date:   Mon Jan 28 16:54:24 2019 +0100

	    storage: Updated DiskType in VolumeMetadata

	To make it easier to test. This was fine at that time, but now we want
	to check also the storage domain version. This requires passing the
	domain version via multiple layers. Convert the method back to instance
	method to make this easier.

2020-09-29  Ales Musil  <amusil@redhat.com>

	net, ovs: Manage IP stack for OvS networks
	Add code to manage IP configuration for OvS networks.
	This does not yet include management of routes.

	Bug-Url: https://bugzilla.redhat.com/1809102

	net, tests: Add subclass for basic OvS tests
	Every test in this module is dealing with non-IP.
	Moving it to common class can make the suffix obsolete
	and gives space for other tests.

	Bug-Url: https://bugzilla.redhat.com/1809102

	net, ovs: Fix faking of bridgeless
	During the type resolution of faking bridgeless netinfo
	attributes, the type was skipped and left as None. Which
	consequently caused the netinfo not being properly
	propagated to the SB interface.

	Bug-Url: https://bugzilla.redhat.com/1809102

	net, ovs: Fix typo in IP netinfo
	Copy-paste typo in IP configuration netinfo caused
	the IPv4 DHCP be reported instead of IPv6 DHCP.

	Bug-Url: https://bugzilla.redhat.com/1809102

2020-09-29  Andrej Cernek  <acernek@redhat.com>

	net, tests: remove NetworkManagerLegacy test cases
	Since all functional tests should soon run on nmstate backend we do not
	need to test Network Manager as well.

2020-09-29  Ales Musil  <amusil@redhat.com>

	net, automation: Let the CI fail if func tests fails
	Timeout for dhcp response was increased due to slow up
	on CI machines.

	net, tests: Mark unstable tests as xfail
	Three types of tests were marked as xfail on CI.

	The first being test_create_network_and_reuse_existing_owned_bridge
	which has unstable link on CI.

	Second are dynamic tests that do require DHCPv6 response
	which in current state is not able to arrive to client
	on CI once in a while.

	Third is the
	test_add_net_on_existing_external_bond_preserving_mac
	which fails with nmstate mainloop abort too often on CI.

	net, tests: Pass the CI flag to the functional test container
	The CI flag is a indicator for some tests that might be unstable
	on CI.

	net: Fix dhcp monitor to report only global IPv6
	Currently the monitor reported link local IPv6 addresses
	which caused early refresh caps and the global address might
	not have been present. This was most noticeable on CI
	where the response from DHCP server can be significantly
	longer.

	net, tests: Clear monitored items pool after each test
	Some tests are not able to clear the pool by itself.
	This applies to tests that are running without dhcp server
	and tests that will time out on the dhcp monitor.

	net, tests: Fix dnsmasq logging

	net, tests: Use --no-ping with dnsmasq
	The ping is useful to check if there is someone
	using the address. This is not needed for the tests because
	we are setting up only interfaces managed by us. By removing
	the ping, the response for DHCP request was improved
	significantly.

	net, tests: Use blocking dhcp as indicator for sync dhcp
	If blocking dhcp is specified we should wait for the dhcp
	response. In some tests we do not actually need to wait
	for the result because dhcp server is not running.

	Use the blocking dhcp and skip the wait in tests that do
	not need it. This will cut running time of functional tests.

2020-09-29  Bell Levin  <blevin@redhat.com>

	net, CI: remove common functions were ran by lago
	The common_network was used in the past to run some network test
	stages in the CI, with lago. We have moved on to containers since,
	and a different way of testing - making this file obsolete.

2020-09-25  Eyal Shenitzky  <eshenitz@redhat.com>

	volume.py: add volume bitmaps after snapshot created
	If a volume contains QEMU bitmaps and a snapshot
	created for that volume, the bitmaps will not be copied to
	the new 'active' volume, which means that the next backup
	operation on that volume will fail.

	Calling 'qemu-img bitmaps' sub-command to add all the bitmaps
	from the source volume to the destination volume.

	New bitmaps.py helper introduced to include all bitmaps related
	logic.

	Bug-Url: https://bugzilla.redhat.com/1861667

	conftest.py: Add tmp_mount for fixture
	Move user_mount fixture from qemuimg_test.py to conftest.py
	so it can be used in other tests without importing it.
	Also, the fixture was renamed to tmp_mount according to conftest.py
	style.

2020-09-24  Nir Soffer  <nsoffer@redhat.com>

	nbd: Allow exporting a volume without the backing chain
	Add backing_chain argument to NBD server configuration. If false and
	using volume in qcow2 format, expose only the specified volume instead
	of the entire chain.

	If not specified or true, export the entire chain. This argument has no
	effect when volume format is raw.

	Allocated areas in the volume will be read as data (zero=False) or as
	zero (zero=True, hole=False). Unallocated areas will read as a hole
	(zero=True, hole=True).

	Here is an example for extents response:

	    {"start": 0, "length": 65536, "zero": False, "hole": False},
	    {"start": 65536, "length": 65536, "zero": True, "hole": False},
	    {"start": 131072, "length": 65536, "zero": True, "hole": True},

	- The first extent is data extent.
	- The second extent is zero extents but it is not a hole. When copying
	  this extent, the destination image must be zeroed.
	- The last extent is a hole; when copying this extent, the destination
	  image must not be zeroed.

	Bug-Url: https://bugzilla.redhat.com/1847090

2020-09-23  Dan Kenigsberg  <danken@redhat.com>

	drop API.py from execcmd-allowlist

	build: rename execcmd-blacklist
	execcmd-blacklist.txt holds filenames in which we temporarily allow
	using the ugly execCmd function. It's not a blacklist, it's an
	allowlist. Let's name it properly

2020-09-23  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.31

2020-09-23  Nir Soffer  <nsoffer@redhat.com>

	readme: Show how to use virtual environment
	It is possible to install tox with --user, but having virtual
	environment separated from the host is the best way to avoid trouble.
	Show how to create a virtual environment and how to use it.

2020-09-23  Liran Rotenberg  <lrotenbe@redhat.com>

	vmdevices: forward compatibility for lun guestName
	The engine expects imageID element when it sees guestName element.
	Without populating it, it will cause a NullPointerException in the
	engine; breaking forward compatibility.

	This patch populates the imageID property to the disk metadata for LUNs
	disks, with the LUN's GUID. It will resolve the forward compatibility to
	the engine.

	Bug-Url: https://bugzilla.redhat.com/1859092

2020-09-23  Tomáš Golembiovský  <tgolembi@redhat.com>

	qga: mark no failure as 0 instead of None
	Use 0 to mark that there was no failure communicating to qemu-ga yet. It
	makes it easier to build check condition.

2020-09-23  Nir Soffer  <nsoffer@redhat.com>

	readme: Replace yum with dnf

2020-09-23  Milan Zamazal  <mzamazal@redhat.com>

	spec: Depend on libvirt 6.6.0-6
	It fixes VM start failures when a SCSI host device is present.

	Bug-Url: https://bugzilla.redhat.com/1876605

2020-09-22  Amit Bawer  <abawer@redhat.com>

	tests: Add merge_test module to virt tests
	Currently adds testing for live merge cleanup thread
	state transitions.

	Bug-Url: https://bugzilla.redhat.com/1857347

	vm: Stop live merge cleanup attempts for unrecoverable pivot error
	If pivot attempt has failed due to an unrecoverable libvirt error
	the cleanup thread would still be re-invoked on the next
	block jobs query cycle. There is no purpose for such retries in
	case of a libvirt error which is not for an ongoing block copy.

	This patch adds exception handling for BlockJobUnrecoverableError
	in case libvirt pivot operation has raised a non-block-copy error,
	this indicates that the cleanup thread will not be re-invoked
	for the failing block commit job in a futile attempt to recover it
	later as its state was set to ABORT.

	In case cleanup has failed with an unrecoverable error, the following
	error is printed to log:

	ERROR (merge/job-name) Pivot failed (job: job-id): Block job job-id failed with libvirt error: error, aborting due to an unrecoverable error

	Bug-Url: https://bugzilla.redhat.com/1857347

	vm: Use state indication for live merge cleanup thread
	This change allows to follow the cleanup thread by observing its states:

	- TRYING: Starting state for a fresh thread, trying to pivot.
	- RETRY: Recoverable failure during cleanp indicating to the caller
	         that the thread run should be retried.
	- DONE: Successful completion state for the cleanup thread.

	In case the cleanup run has terminated with a recoverable error
	the corrsponding log warning is printed:

	WARNING (merge/job-name) Pivot failed (job: job-id): Block copy job job-id is not ready for commit, retrying later

	And the thread will transition into RETRY state.

	In a followup patch an ABORT state will be added to inidicate that
	an unrecoverable error was met during pivot operation and merge cleanup
	should not be reattempted.

	Bug-Url: https://bugzilla.redhat.com/1857347

2020-09-22  Liran Rotenberg  <lrotenbe@redhat.com>

	vmdevices: add mapping of lun disks
	In case we are using a direct LUN as a disk. The qemu-guest-agent is
	reading its logical name within the guest. This patch will add the
	logical name of such devices to the VM metadata.

	Bug-Url: https://bugzilla.redhat.com/1859092

2020-09-22  Tomáš Golembiovský  <tgolembi@redhat.com>

	qga: remove dead appsList code
	Originally we used to mimic the behavior of oVirt Guest Agent and report
	qemu-ga as 'QEMU guest agent' in apps on Windows. During transition to
	libvirt API this code turned out to be dead and we started reporting
	qemu-ga as 'qemu-guest-agent-<version>' just like we do on Linux.
	Instead trying to fix the code for Windows it is better to drop the dead
	code altogether and keep reporting what we do now. Engine needs the
	format with qemu-ga version for tracking of tools update process.

	Bogus constant _GUEST_OS_LINUX is also removed because it is misleading.
	There is no single value for linux OSes as it varies between
	distributions (e.g. 'rhel', 'fedora', etc.).

2020-09-22  Ales Musil  <amusil@redhat.com>

	net: Remove unused code from configurator
	With change to pure nmstate some code is no longer
	needed.

	Bug-Url: https://bugzilla.redhat.com/1853320

	net, nmstate: Use common switch split for validator
	Validator will use same switch split as nmstate code
	which should prevent bugs and inconsistencies.

	Previously running networks switch were taken from
	netinfo which caused confusion. In the code there was
	no way to tell the difference between ovs and nmstate
	easily. Now when the information is taken from
	running config the difference should be clear.

	Bug-Url: https://bugzilla.redhat.com/1853320

	net, nmstate: Provide straight path for setup networks
	Setup network calls only nmstate. This leads to removal
	dependency on netinfo as nmstate does not need netinfo
	to do the actual setup networks.

	Bug-Url: https://bugzilla.redhat.com/1853320

	net, nmstate: Move switch split to netswitch util
	This code is supposed to be used by netswitch and
	nmstate. This can be easily achieved by moving it
	to common place.

	To keep the switch split compatible, removal of non existing
	network will default to legacy switch type.

	Bug-Url: https://bugzilla.redhat.com/1853320

	net, nmstate: Remove config option to disable nmstate
	By removing the config option the user will have
	to use nmstate as network backend.

	This is a first step in series of patches which should
	lead to removal of most of the old vdsm network code.

	Bug-Url: https://bugzilla.redhat.com/1853320

	net, tests: Fix unit tests to use nmstate backend
	Unit tests were configured not to use nmstate as default
	backend. Fix the default.

	Bug-Url: https://bugzilla.redhat.com/1853320

	net, tests: Remove a possibility to run legacy backend tests
	With depracation it is not usefull to run those tests anymore.
	Remove the possibility to configure the run of those tests.

	Bug-Url: https://bugzilla.redhat.com/1853320

	net, doc: Clear old functional tests documentation
	Some parts of the functional test doc are not relevant anymore.
	Remove it to keep it clear.

	Bug-Url: https://bugzilla.redhat.com/1853320

	net, tests: Clear unused tests from check-network stage
	Init scripts are deprecated since oVirt 4.4. There is no point
	in keeping those tests if they are not checked anymore.

	Bug-Url: https://bugzilla.redhat.com/1853320

2020-09-21  Eyal Shenitzky  <eshenitz@redhat.com>

	API.py: Add new addBitmaps flag to Volume.create() verb
	Now, qemu-img supports the 'bitmaps' sub-command to perform operations
	on bitmaps that were created during a backup operation.

	A new flag added to Volume.create() verb, this flag will instruct
	the host, in case of snapshot creation, to add all the bitmaps from
	the source volume to the newly created volume with the same bitmap names.
	Currently, in this case, the bitmaps will not be added to the newly
	created volume and when a backup operation will take place it will be
	failed.

	This patch only adds the API for adding the bitmaps when a snapshot is
	added to a volume, the following patches will implement adding the
	bitmaps of the for file and block volumes.

	Bug-Url: https://bugzilla.redhat.com/1861667

2020-09-15  Amit Bawer  <abawer@redhat.com>

	vm: Use vdsm.common.errors.Base for internal module exceptions base
	It would still have Exception as base class but will also provide
	the same __str__ method some of the internal exceptions need to use
	without the redundant implementation on the derived level.

	This is followup to CR comment: https://gerrit.ovirt.org/#/c/110950/5/lib/vdsm/virt/vm.py@158

	Bug-Url: https://bugzilla.redhat.com/1857347

2020-09-14  Eyal Shenitzky  <eshenitz@redhat.com>

	qemuimg.py: add support for '--bitmaps' flag in qemuimg.convert()
	When using '--bitmaps' flag in qemuimg.convert() it will instruct
	QEMU to add all the bitmaps from the source volume to the destination
	volume.

	This patch adds the support for this flag in qemuimg.convert().

	Bug-Url: https://bugzilla.redhat.com/1861667

	qemuimg.py: add support for bitmaps operations
	 - Add support for using the new 'bitmap' sub-command
	 - Add 'bitmaps' parsing for qemuimg.info()

	Bug-Url: https://bugzilla.redhat.com/1861667

2020-09-14  Andrej Cernek  <acernek@redhat.com>

	net, tests: clearly separate test and setup in switch_type_change_test

	net, tests: clearly separate test and setup in stats_test

	net, tests: clearly separate test and setup in static_ip_test

	net, tests: autouse preserve_conf in static_ip_test
	Since this fixture is not used in the tests directly, it's
	better to autouse it.

	net, tests: clearly separate test and setup in rollback_test

	net, tests: clearly separate test and setup in netrestore_test

	net, tests: clearly separate test and setup in net_with_bond_test

	net, tests: clearly separate test and setup in net_qos_test

	net, tests: use fixture for setup in net_basic_test

	net, tests: clearly separate test and setup in link_mtu_test

	net, tests: clearly separate test and setup in dynamic_ip_test

	net, tests: clearly separate test and setup in dns_test

	net, tests: clearly separate test and setup in bond_basic

	net, tests: create test adapter only once per session
	Each module has been creating its own test adapter at the module level
	through pytest fixture. This is not really needed and only creates
	unnecessary code repetition as compared to session level fixture.

	net, tests: remove py2 legacy code from functional
	Since we no longer run the tests on py2, both six and __future__ are
	unnecessary.

	net, tests: remove redundant function from func adapter

2020-09-13  Vojtech Juranek  <vjuranek@redhat.com>

	tests: fix failing lvmfilter test
	Move /sys/block/{}/device/subsystem into a constant so we can easily
	monkeypatch it in the test and create new fixture which simulates that
	sda device is present is it's a scsi device. Without this fixture,
	lvmfilter_test.test_find_wwids() fails on machines which haven't sda
	device present.

	Bug-Url: https://bugzilla.redhat.com/1837864

2020-09-11  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.30

2020-09-10  Amit Bawer  <abawer@redhat.com>

	lvmfilter: Add blacklist check to analyzer decision
	This will also take into account the wanted and current wwids
	configured for Vdsm multipath blacklist when making an advice
	regarding the lvm filter configuration and its correlating disks
	to be blacklisted.

	Bug-Url: https://bugzilla.redhat.com/1837864

	mpathconf: Add module
	This module will help configuring multipath blacklist during
	lvm filter configuration to prevent multipath from claiming
	disks that are locally mounted and were not used by multipath
	so far.

	Bug-Url: https://bugzilla.redhat.com/1837864

	lvmfilter: Add function for finding lvm mounts disks WWIDs
	This will help to set the WWIDs of the disks to be blacklisted
	from multipath on lvm filter configuration.

	Bug-Url: https://bugzilla.redhat.com/1837864

	lvmfilter: Add function for finding disk devices of lvm devices
	This will allow to query the lvmfilter module for the devices underlying
	the /dev/mapper devices used for the local PVs we want to add to the
	lvm filter.

	We gather only disk devices for non-multipath devices as we do not want
	to blacklist already configured multipath devices on the host, but
	preventing it from claiming local devices which were not originally
	managed by multipath.

	The disk devices WWIDs will be added to multipath blacklist config,
	so their names would remain consistent during reboots and upgrades.

	Bug-Url: https://bugzilla.redhat.com/1837864

2020-09-10  Ales Musil  <amusil@redhat.com>

	net, nmstate: Use the new OvS netinfo for reports
	Expose the OvS netinfo API and update devices netinfo
	with info computed about OvS.

	Bug-Url: https://bugzilla.redhat.com/1809102

	net, nmstate: Report ip configuration for OvS in capabilities
	Add ability for the ovs netinfo to report also the ip
	configuration. Everything is taken from the nmstate
	which works because we are configuring ovs only through nmstate.

	Bug-Url: https://bugzilla.redhat.com/1809102

	net, nmstate: Add default route destination class
	In order to reuse the default route destination
	add the constants to separate class.

	Bug-Url: https://bugzilla.redhat.com/1809102

	net, nmstate: Optimize usage of state_show
	Calling state_show multiple times in the flow is not ideal.
	Use single function that will retrieve the full state that
	can be passed to the filtering functions.

	Bug-Url: https://bugzilla.redhat.com/1809102

	net, nmstate: Move common functionality for dhcp and autconf check
	The api function and ovs netinfo can use the same logic
	for getting dhcp and autoconf. Move it to common place.

	Bug-Url: https://bugzilla.redhat.com/1809102

	net, nmstate: Add support for basic OvS get caps
	The get caps returned by OvS should correspond with
	the common format. This includes faking certain devices
	because OvS works differently than Linux bridge.

	There are still some pieces missing to be reported.
	Most importantly the IP configuration reporting is missing,
	then the STP and another ports that are configured
	per specified network.

	Bug-Url: https://bugzilla.redhat.com/1809102

	net, nmstate: Fix support for vlan with id 0
	Linux bridge already supports this scenario and it
	was forgotten in OvS.

	Bug-Url: https://bugzilla.redhat.com/1809102

	net, nmstate: Move SwitchType to bridge_util
	SwitchType can be used by ovs netinfo and setup networks.
	Move it to common place.

	Bug-Url: https://bugzilla.redhat.com/1809102

	net, nmstate: Move translate config to bridge util
	Translate config can be used for getting the ovs netinfo
	and for setup networks. Move it to common place.

	Bug-Url: https://bugzilla.redhat.com/1809102

	net, nmstate: Move important info in preparation for getCaps
	Move construction of info about ovs to its own class.
	This can be reused to setup networks and at the same time
	for refresh caps.

	Bug-Url: https://bugzilla.redhat.com/1809102

2020-09-10  Vojtech Juranek  <vjuranek@redhat.com>

	lvmfilter: normalize resolved links
	If the user has correct filter, but has different order or items in
	the fitler than the suggested filter, fitler configuration fails.
	Normalize resolved paths to ensure we have always same order of compared
	items.

	Bug-Url: https://bugzilla.redhat.com/1837864

2020-09-09  Eyal Shenitzky  <eshenitz@redhat.com>

	spec: Update qemu-kvm requirement for RHEL AV 8.3
	Update qemu-kvm requirement to support new 'qemu-img bitmap' command
	for RHEL-8.3.

	Also bump CentOS qemu-kvm version but it still not supports
	'qemu-img bitmap' command

	Bug-Url: https://bugzilla.redhat.com/1861667

2020-09-09  Vojtech Juranek  <vjuranek@redhat.com>

	storage: don't hardcode device-mapper major number
	Major number for device-mapper devices is usually 253, but in some rare
	cases (e.g. VMs installed via virt-builder), this number is different.
	Range 240-254 is for experimental usage and number 253 is not reserved
	for device-mapper, see [1] for reserved numbers and more details.

	Don't hardcode the number but find it out on every host by reading
	/proc/devices and searching for device-mapper major number.

	[1] https://www.kernel.org/doc/Documentation/admin-guide/devices.txt

2020-09-09  Eli Mesika  <emesika@redhat.com>

	core: Add support for cluster version 4.5
	This patch adds support for cluster version 4.5.

	Since the new cluster version support will be available only
	on libvirt >= 6.6, and we don't want to break existing
	tests still running older libvirt versions, the capability
	is added dynamically.

2020-09-08  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.29

	automation: Remove version from python3-ioprocess
	Vdsm depends on python3-ioprocess >= 1.4.1 and 1.4.2 is the current
	version available, which makes the current automation requirement to
	fail.

2020-09-08  Vojtech Juranek  <vjuranek@redhat.com>

	lvmfilter: replace unstable names in filter with stable ones
	In the past we use unstable device names in the filter (e.g. /dev/sda1).
	Such name can change during reboots and filter can become invalid.

	Add new check into lvm filter if we can replace existing filter with
	unstable names with one using stable names. The replacement is suggested
	only in case we can replace all items and the filter wasn't modified by
	the user. E.g. if user added more items (no matter if with stable or
	unstable names), we won't do any replacements and suggest to fix the
	filter manually. Also, as we want to replace names with ones containing
	UUIDs, we cannot replace any regular expressions with wild cars. The
	only exception is ignoring all devices ("r|.*|"), which is kept as is in
	case of replacement.

	If we detect, that filter can be replaced by new one with stable names,
	CONFIGURE advice with new filter is returned, otherwise RECOMMEND with
	new filter is returned.

	Bug-Url: https://bugzilla.redhat.com/1837864

2020-09-08  Tomáš Golembiovský  <tgolembi@redhat.com>

	qga: release lock after timeout reset
	To avoid accidental races when setting the timeout we should first reset
	the default value and then release the lock. Otherwise we could possibly
	reset timeout that was already set in different thread.

	To make sure the lock is always released, even in case ov libvirt error,
	another try-finally block is added.

	qga: the default setting for context should be blocking
	The `default` timeout value means 5 seconds for `guest-sync` and 5
	seconds for the requested command. This is not the same as the behavior
	before introducing the timeout setting API in libvirt. In fact the
	original behavior corresponds to the `blocking` setting which is not
	100% blocking but it still has 5 seconds timeout on the `guest-sync`
	command and blocks only for the duration of the requested command.
	The `blocking` setting matches exactly the behavior prior introducing
	the libvirt timeout API. Without it some long running tasks (e.g.
	filesystem freeze) can timeout unexpectedly.

	Bug-Url: https://bugzilla.redhat.com/1870500

2020-09-07  Vojtech Juranek  <vjuranek@redhat.com>

	storage: don't use get() method on Drive object
	In commit

	    commit a4771a98ec44966f402eca34d2dfe660031f2b7e
	    Author: Eyal Shenitzky <eshenitz@redhat.com>
	    Date:   Wed Apr 10 11:59:30 2019 +0300

	        vm: refactor _findDeviceByNameOrPath() and make it public

	when refactoring of Vm.find_device_by_name_or_path(), we changed a way
	how to access path attribute from the device to use get() method.
	However, the drive is of the type vmdevices.storage.Drive and this
	class hasn't any get() method. Revert this change and access path via
	__getitem__().

	Disk devices are set in Vm._perform_host_local_adjustment() [1] and
	device is always instance of vmdevices.storage.Drive.

	[1] https://github.com/oVirt/vdsm/blob/v4.40.22/lib/vdsm/virt/vm.py#L2434

	Bug-Url: https://bugzilla.redhat.com/1875805

2020-09-07  Nir Soffer  <nsoffer@redhat.com>

	lvm: Disable LVs caching
	Recent failures in OST are caused by enabling LVs caching in:

	commit 58e3efe5bd1f04f51cbcb5e43ad09634818c1569
	Author: Amit Bawer <abawer@redhat.com>
	Date:   Mon Mar 16 12:56:31 2020 +0200

	    lvm: Turn stalelv boolean indication into freshlv set

	What we see in OST:

	host-0 (SPM):

	2020-09-04 18:24:43,925: Starting create LV (create LV).
	2020-09-04 18:24:46,043: Completing create LV (change tags).

	host-1:

	2020-09-04 18:24:58,543: Prepare image fails, volume does not exist.

	Preparing an image does:

	    HSM.prepareImage()
	        BlockStorageDomain.getAllVolumes()
	            BlockStorageDomain.getAllVolumesImages()
	                blockSD.getAllVolumes()
	                    blockSD._getVolsTree()
	                        blockSD._iter_volumes()
	                            lvm.getLV()
	                                LVMCache.getLv()
	                                    LVMCache._lvs_needs_reload()

	On the SPM, creating a LV invalidate the cache, so _lvs_needs_reload()
	will return True. On other hosts, if LVs were reloaded recently, the
	cache is considered fresh, and _lvs_needs_reload() will return False.
	Then we use the cached LVs which do not include the new LV created on
	the SPM.

	Add cache_lvs flags to LVMCAche, set to False by default. Use the lvs
	cache only when the flag is True. This is currently used only in the
	tests, verifying the logic for using the cache.

	We can use the new cache_lvs on the SPM to avoid unneeded reloads in the
	future, or when we have a way to detect changes in the VG from other
	hosts.

	Bug-Url: https://bugzilla.redhat.com/1876230

2020-09-03  Vojtech Juranek  <vjuranek@redhat.com>

	storage: fix typos in lvmfilter module

2020-09-02  Milan Zamazal  <mzamazal@redhat.com>

	virt: Return back the check for host shutdown
	When a VM stops running due to shutdown, libvirt tells us whether the
	reason was shutdown from within the guest or from the host.  In commit
	21d8c83d700b99827784585fb91087cd9fe319f3, we replaced some of our
	guesswork with the information from libvirt.

	However that information is also not completely reliable.  When a host
	is shut down by an administrator, e.g. by running `poweroff',
	libvirt-guests service gets invoked and attempts to shut down the
	running VMs gracefully, using `virsh shutdown' command.  This is a
	good thing to do, but then it looks as a user initiated shutdown to
	QEMU and libvirt (confirmed in
	https://www.redhat.com/archives/libvirt-users/2020-August/msg00095.html).
	If the stopped VM is a highly available VM then
	it's not started on another host due to the wrong shutdown reason
	reported to Engine.

	Let's fix that by returning back one of our former checks for host
	shutdown.  The aforementioned commit tried to fix a similar issue
	where a host shutdown hasn't been detected, under different
	circumstances, but it created the problem described above.  Adding the
	additional check fixes that.  We also add the check when receiving an
	unexpected shutdown detail, it doesn't harm there and can help
	identify the right shutdown reason.

	Bug-Url: https://bugzilla.redhat.com/1800966

	virt: Reduce the nesting level in Vm._handle_libvirt_domain_shutdown
	A little refactoring to make the followup patch nicer.

2020-09-02  Marcin Sobczyk  <msobczyk@redhat.com>

	spec: Don't recreate config on upgrades
	In [1] we introduced a fix for coredump generation
	on hosts. Unfortunately the scriplet in the spec didn't
	took into account upgrade scenarios, and now these attempts
	end with an error:

	 ln: failed to create symbolic link '/etc/sysctl.d/50-coredump.conf': File exists

	This patch changes the scriplet to only create the symlink
	on clean installations.

	[1] https://gerrit.ovirt.org/#/c/107514/

	Bug-Url: https://bugzilla.redhat.com/1874807

2020-09-02  Benny Zlotnik  <bzlotnik@redhat.com>

	vm: enhance merge log
	Log the drive's name and alias, as well as the drive's chain to improve the
	debugging experience.

	vm: enhance clear_drive_threshold log
	Log the drive's alias, as well as the drive's chain to improve the
	debugging experience.

2020-09-01  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.28

2020-08-31  Ales Musil  <amusil@redhat.com>

	net: Fix validation for used vlan
	We do not allow unmanaged vlans to have same base interface
	with bond. The problem was that current code did not care
	if the vlan was unmanaged. That resulted in "Interface in use"
	validation error for vlans mamanged by vdsm.

	Include only the unmanaged vlans in the validation. Since
	it is not possible to take over vlan and define a bond over
	it's base interface in one operation, the check can be done
	over current network state and not desired.

	Bug-Url: https://bugzilla.redhat.com/1870148

2020-08-31  Nir Soffer  <nsoffer@redhat.com>

	static: Support all-in-one setup
	All-in-one setup, when both engine and vdsm are running on the same host
	was deprecated in 3.6 and became unsupported in 4.0. However users are
	still using this setup for some reasons.

	With this setup we have both engine and vdsm configuration in:

	    /etc/ovirt-imageio/conf.d/
	        50-engine.conf
	        50-vdsm.conf

	Since 50-vdsm.conf is sorted after 50-engine.conf, vdsm configuration
	overrides engine configuration. However engine set:

	[local]
	enable = false

	[control]
	transport = tcp

	When vdsm set nothing since this the defaults are good enough. So engine
	configuration breaks the control socket, and vdsm fails to add tickets
	with:

	    Image daemon is unsupported

	To fix this, vdsm configuration was renamed to 60-vdsm.conf, to make it
	clear that both engine and vdsm configuration can be on the same host,
	and the missing configuration was added to vdsm configuration.

	Bug-Url: https://bugzilla.redhat.com/1871348

2020-08-28  Milan Zamazal  <mzamazal@redhat.com>

	virt: Don't send device hash for non-libvirt domain XML
	When a VM is in the process of starting, Engine may call
	getAllVmStats, find a device hash there for the starting VM and since
	it has no previous device hash it asks for the VM domain XML by
	calling dumpxmls.  However, the VM may not have its domain XML
	obtained from libvirt and then dumpxmls responds with the domain XML
	initially obtained from Engine.  The domain XML doesn't contain added
	information from libvirt for newly created VMs yet, but Engine expects
	it and may complain with messages such as:

	  ERROR ... managed non pluggable device was removed unexpectedly from libvirt: ...

	or

	  DEBUG ... managed pluggable device was unplugged : ...

	It should normally get fixed in a while, by a later dumpxmls call,
	unless the VM fails to finish its start, in which case the devices
	would remain unplugged.

	This patch distinguishes between domain XMLs obtained from Engine and
	libvirt and doesn't report device hashes for the former.  This is an
	API change but Engine is happy with it and stops unplugging the
	devices and logging the errors.

	We consider _srcDomXML's also initial, because they are retrieved from
	libvirt as migratable and needn't be valid at all from some points of
	view.

	Bug-Url: https://bugzilla.redhat.com/1870108

2020-08-27  Steven Rosenberg  <srosenbe@redhat.com>

	virt: Convert mem_bps to Mbps
	Converts the mem_bps value to Mbps when printing
	to the log for consistency with the WebAdmin and
	standard practice.

	Bug-Url: https://bugzilla.redhat.com/1845397

2020-08-26  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.27

2020-08-25  Tomáš Golembiovský  <tgolembi@redhat.com>

	qga: reset users list when all users log off the guest
	When all users logged off and user count reached 0 we did not properly
	reset the list of users. Thust the last non-empty known set of users was
	still reported indefinitely.

	Bug-Url: https://bugzilla.redhat.com/1871202

2020-08-24  Amit Bawer  <abawer@redhat.com>

	monitor: Use a list copy of the monitor values for stopping
	The stopMonitors method loops over the monitors dict and removes
	items from it once the corresponding monitor thread has stopped.

	This leads to RuntimeError since we pass a view to the monitors
	dict itself which the method later iterates over and modifies its size.

	 File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 3429, in prepareForShutdown
	    self.domainMonitor.shutdown()
	  File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 254, in shutdown
	    self._stopMonitors(self._monitors.values(), shutdown=True)
	  File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 273, in _stopMonitors
	    for monitor in monitors:
	RuntimeError: dictionary changed size during iteration

	This regression was introduced as part of monitor fixes patch:

	 commit cecc2d1f9d77f80843ca8977936d3479f25bf609
	 Author: Amit Bawer <abawer@redhat.com>
	 Date:   Tue Jul 14 11:08:00 2020 +0300

	     monitor: Add monitors dictionary lock

	lvm: Turn stalelv boolean indication into freshlv set
	Use a freshlv set indicating that all lvs are cache fresh for a vg,
	when vgName is part of the set.

	This would avoid one call for reloadlvs(vgName="vg1", lvNames=None)
	turning the old boolean indication of stalelv as False for any
	decision to follow for a cache reload in getLV(vgName="vg2", lvName=None)
	and wrongfully skipping the cache reload for vg2.

2020-08-21  Germano Veit Michel  <germano@redhat.com>

	dump-volume-chains: stop using six.iteritems
	There are no plans to backport recent changes to el7/py2,
	we can stop using six.iteritems.

2020-08-20  Nir Soffer  <nsoffer@redhat.com>

	tool: multipath: Improve documentation
	The existing documentation was focused on single use case. Replace with
	more general text explaining all possible use cases and how the current
	code deals with all of them.

	This area of the code was broken many times in the past because it was
	poorly documented and developers did not understood the different use
	cases. I hope that with the new documentation this will not happen
	again.

2020-08-20  Andrej Cernek  <acernek@redhat.com>

	net, tests: refactor tc tests to use fixtures
	To take advantage of pytest fixtures, test setups have been separated
	from tests themselves. This allows better re-usability as well as more
	readable and less indented code.

	net, tests: refactor netinfo tests to use fixtures
	To take advantage of pytest fixtures, test setups have been separated
	from tests themselves. This allows better re-usability as well as more
	readable and less indented code.

	net, tests: refactor link bond tests to use fixtures
	To take advantage of pytest fixtures, test setups have been separated
	from tests themselves. This allows better reusability as well as more
	readable and less indented code.

	net, tests: remove unnecessary nics from link bond test
	There are dummy_devices in the test_bond_update_existing_arp_ip_targets,
	that are not used at all, this patch removes them.

2020-08-19  Ales Musil  <amusil@redhat.com>

	net, nmstate, tests: Add unit tests for ovs basic scenarios
	Add tests that cover basic scenarios to ensure that
	the bridge is created or reused correctly. Also
	on the opposite side if the last network over bridge
	is removed ensure that the bridge is removed as well.

	Bug-Url: https://bugzilla.redhat.com/1809102

	net, nmstate: Expose OvS through generate_state
	By exposing the OvS switch it is now possible to
	configure networking with the combination nmstate
	and ovs. However this is by no means complete.
	The exposure is currently used only for unified
	unit test API.

	It is highly advised to actually NOT use ovs over
	nmstate in the current state.

	Bug-Url: https://bugzilla.redhat.com/1809102

	net, nmstate: Introduce basic OvS networking
	Add module that will allow configuration of ovs
	through nmstate. The initial part contains only the
	bare minimum to configure network that is using ovs.

	The structure of the ovs network is generally simpler
	than linux bridge network. On the diagram [0] you can
	see the basic structure.

	The SB represents southbound interface that is usually
	physical nic or bond. For every SB exists only single
	bridge that has SB as one of its ports.

	Any network defined from oVirt point of view is defined
	in form of NB (northbound). NB in ovs is represented
	as a single ovs internal interface connected via port
	to the bridge. This simplifies the VLAN management
	because the filtering is specified per port.

	The former implies that in ovs there is no form
	of bridgeless network as every network is connected
	to a given bridge.

	Another slight caveat with this approach is that bridge
	parameters, most importantly STP, is hard to enforce
	on the bridge as the connected networks might oppose
	whether it should be used or not.

	[0]
	+------+  +------+  +------+
	|      |  |      |  |      |
	|  NB  |  |  NB  |  |  NB  |
	|      |  |      |  |      |
	+--+---+  +--+---+  +---+--+
	   |         |          |
	   |    +----+-----+    |
	   |    |          |    |
	   +----+  OvS br  +----+
	        |          |
	        +----+-----+
	             |
	             |
	        +----+-----+
	        |          |
	        |    SB    |
	        |          |
	        +----------+

	Bug-Url: https://bugzilla.redhat.com/1809102

2020-08-19  Amit Bawer  <abawer@redhat.com>

	constants: Remove EXT_MULTIPATH and MULTIPATH_PATH
	Previous commit removed the last usage for EXT_MULTIPATH
	so we don't need the legacy path and executable constants
	anymore.

2020-08-19  Andrej Cernek  <acernek@redhat.com>

	net, tests: refactor lldpad test

	net, tests: use pytest for setup of integration tests

	net, tests: move nmdbus tests to pytest

	net, tests: move nm tests to pytest

	net, tests: reformat tc tests

	net, tests: move sourceroute tests to pytest

	net, tests: move link vlan tests to pytest

	net, tests: move link iface tests to pytest

2020-08-19  Marcin Sobczyk  <msobczyk@redhat.com>

	configure: Replace '/var/run/' usage with '/run'
	Usage of '/var/run' directory is deprecated [1] and the path
	itself is a symlink to '/run'. This patch replaces all references
	to '/var/run' with '/run'.

	[1] https://www.freedesktop.org/software/systemd/man/tmpfiles.d.html#/run/%20and%20/var/run/

2020-08-19  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.26

2020-08-13  Amit Bawer  <abawer@redhat.com>

	tools: Start and reconfigure multipathd
	In the following host setup scenario:

	- host is booting from a multipath device.
	- host is using the default multipath configuration option:
	 "use_friendly_names = yes".
	- host is not running multipathd service by default.

	the root file system is on device /dev/mapper/mpath{X}. As part of
	Vdsm configuration, we disable user friendly names, and reload
	multipath configuration. This renames /dev/mapper/mpath{X} to
	/dev/mapper/{WWN}.

	Right after we reload multipath we create lvm filter based on the
	device used for the root filesystem. If multipathd was not running
	when we reloaded the service, the lvm filter will be created using
	the wrong device name (/dev/mapper/mpath{X}) and the next boot will
	fail.

	The purpose of this change is to ensure that existing multipath
	devices using "friendly names" are renamed before lvm fiter is set
	on the host to prevent the missing rootfs device issue.

	Bug-Url: https://bugzilla.redhat.com/1859876

2020-08-13  Ales Musil  <amusil@redhat.com>

	net, nmstate: Fix KeyError for bridgeless management networks
	Due to the limitation of the nmstate, default route network
	has to be present with every desired state. This caused an error
	when the default route network is bridgeless and has VLAN defined
	on top of it.

	Pass the configured MTU with the management network.

	Bug-Url: https://bugzilla.redhat.com/1855078

2020-08-12  Marcin Sobczyk  <msobczyk@redhat.com>

	New release: 4.40.25

2020-08-11  Nir Soffer  <nsoffer@redhat.com>

	spec: Require latest ovirt-imageio packages
	Vdsm can work with older version, but we always want to have the
	latest version providing new features and better performance.

2020-08-06  Nir Soffer  <nsoffer@redhat.com>

	Revert "tool: Use multipath force reload option (-r) for config changes"
	Using "multipath -r" when multipath is not enabled fails with:

	    vdsm.common.cmdutils.Error: Command ['/sbin/multipath', '-r']
	    failed with rc=1 out=b'' err=b'Aug 05 05:43:11 | DM multipath
	    kernel driver not loaded\\n'"

	If multipath is not enabled we cannot have devices with wrong name
	so there is nothing to reload, and failing the pointless command is
	harmful.

	This reverts commit 93944b00498d8b351b747ab31e9d298dbcdfeff2.

2020-08-05  Steven Rosenberg  <srosenbe@redhat.com>

	kvm: Added base class with readinto function
	This fix will add the readinto function to both
	existing derived classes, the VMAdapter and
	the StreamAdapter for importing disks referenced
	by paths.

	Bug-Url: https://bugzilla.redhat.com/1849850

2020-08-04  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.24

2020-08-04  Andrej Cernek  <acernek@redhat.com>

	net, tests: fix wrong assert in netinfo test
	One assert in test_local_auto_with_dynamic_address_from_ra has a typo
	from conversion to pytest. This typo caused the assert to always pass.

	net, tests: move link bond tests to pytest

	net, tests: reformat ip wrapper tests
	Test names should be in snake_case and strings should use '.

2020-08-04  Eyal Shenitzky  <eshenitz@redhat.com>

	backup_test.py: add tests for dump_checkpoint()

	backup_test.py: add tests for delete_checkpoints()

2020-08-04  Amit Bawer  <abawer@redhat.com>

	tool: Use multipath force reload option (-r) for config changes
	In the following host setup scenario:

	- host is booting from a multipath device.
	- host is using the default multipath configuration option:
	  "use_friendly_names = yes".
	- host is not running multipathd service by default.

	the root file system is on device /dev/mapper/mapth{X}. As part of
	Vdsm configuration, we disable user friendly names, and reload
	multipath configuration. This renames /dev/mapper/mpath{X} to
	/dev/mapper/{WWN}.

	Right after we reload multipath we create lvm filter based on the
	device used for the root filesystem. If multipathd was not running
	when we reloaded the service, the lvm filter will be created using
	the wrong device name (/dev/mapper/mpath{X}) and the next boot will
	fail.

	The purpose of this change is to ensure that existing multipath
	devices using "friendly names" are renamed before lvm fiter is set
	on the host to prevent the missing rootfs device issue.

	Bug-Url: https://bugzilla.redhat.com/1859876

	lvmfilter: Use /dev/disk/by-id/lvm-pv-uuid devlinks for pv naming
	Using the conventional /dev/sdX naming may alter between reboots
	which is unsafe for local system lvm filter. Using the pv-uuid
	devlinks provides consistent mapping between the actual PV to be
	kept in the filter and its device path.

	Bug-Url: https://bugzilla.redhat.com/1635614

2020-08-04  Andrej Cernek  <acernek@redhat.com>

	net, tests: move ip rule tests to pytest

	net, tests: move ip route tests to pytest

	net, tests: refactor ip address tests

	net, tests: use snake_case in ifcfg config writer tests

	net, tests: move ethtool_test to pytest

	net, tests: move cmd tests to pytest

2020-08-03  Eyal Shenitzky  <eshenitz@redhat.com>

	backup_test.py: add tests for list_checkpoints and redefine_checkpoints

	fakedomainadapter.py: add checkpointCreateXML()
	Add checkpointCreateXML() call for use in later tests
	for redefine_checkpoints() and checkpoints_list().

	backup_test.py: add test for incremental backup flow
	Add test for incremental backup flow that includes backup and
	checkpoint XML verification.

	It allows testing the incremental backup XML in a real flow behavior,
	so now the specific test for incremental backup XML is redundant.

	fakedomainadapter.py: use a list of output checkpoint instead of checkpoint XML
	Changed FakeDomainAdapter to use a list to keep the output checkpoints in
	FakeDomainAdapter object, also it is now hold FakeCheckpoint objects and
	not the checkpoint XML string.

	This is needed for later tests for checkpoints_list and
	redefine_checkpoints API.

2020-08-03  Ales Musil  <amusil@redhat.com>

	net, tests: Allow additional pytest arguments
	Allow additional pytest arguments for functional test.
	This is useful for failing the whole case upon first
	failure or running only specific subset of tests.

2020-08-03  Eyal Shenitzky  <eshenitz@redhat.com>

	fakedomainadapter.py: add input_checkpoint_xml member
	Add input_checkpoint_xml to store the created checkpoint_xml while
	FakeDomainAdapter.backupBegin() is called.

	It will allow testing that checkpoint_xml was created as expected in
	the test itself and testing the checkpoint XML in a real flow behavior,
	so now the specific test for checkpoint XML is redundant.

2020-08-03  Ales Musil  <amusil@redhat.com>

	net, tests: Format the integration and unit run-tests scripts

2020-08-01  Amit Bawer  <abawer@redhat.com>

	tests: Add iscsi initiator test
	Add test for iscsi initiator setup and target logins.

	Results for testing 2 iscsi targets over 2 portals (4 connections in total):

	Login Scheme        	   Online    Active       Total Login
	                           Portals   Sessions     Time (sec)
	--------------------------------------------------------------
	All at once          	    2/2         4          2.1
	All at once         	    1/2         2          120.2
	Serial target-portal  	    2/2         4          8.5
	Serial target-portal        1/2         2          243.5
	Concurrent target-portal    2/2         4          2.1
	Concurrent target-portal    1/2         2          120.1

2020-07-30  Germano Veit Michel  <germano@redhat.com>

	tests: reload: Separate write and read delay options
	As a general testing tool, having separate read and write
	delay options allows more flexibility when testing cases
	where we want a quick read during a write or vice versa.

	Related-To: https://bugzilla.redhat.com/1837199

2020-07-29  Milan Zamazal  <mzamazal@redhat.com>

	automation: Add rhel8 distribution

2020-07-29  Marcin Sobczyk  <msobczyk@redhat.com>

	tests: ssl: Handle rhel8 in ssl tests
	Depending on the crypto policy of the distro, some ssl tests
	should fail, and some succeed.

	This patch adds proper handling of rhel, on which both TLS 1.0
	and TLS 1.1 should not work, given the restrictive, default crypto
	policy. Additionally some 'six.PY[23]' conditionals have been removed,
	since we don't support py2 anymore.

2020-07-28  Ales Musil  <amusil@redhat.com>

	pytest: Unify logging configuration
	Currently logs from pytest runs were configured
	differently in each module which created inconsistency.

	Unify it to have common pattern that includes
	information that might be helpful for finding the issue.
	With that we can remove the individual settings.

	Only place that remains is the testrunner.py which still
	uses nose. Update its format to match the global one.

	net, nmstate: Make the calculation of vlan base mtu more efficient
	Currently vlan mtu was enforced even if the new mtu
	was exactly the same as the old one. This was
	uncovered by testing of nmstate 0.3. Vdsm would also
	enforce MTU on base of external vlans which caused
	failures of some tests.

2020-07-28  Marcin Sobczyk  <msobczyk@redhat.com>

	New release: 4.40.23

2020-07-28  Bell Levin  <blevin@redhat.com>

	net: Add vdsm common patching for unit tests
	The network unit test should not compile vdsm, since we should
	not rely on the developer's system, and have a universal
	environment. The common files are required by almost
	all network services and therfore needs to be patched.

	An el8 pregenerated static version was added to be able to
	run the unit tests without compiling vdsm (this version
	is the bare minimum of variables to run to be able to pass
	the imports of the network conftest)

	Patching the files during the run test script is preffered,
	rather than patching through pytest (in conftest.py).
	Constants is imported as early as in the conftest itself,
	which leads to an exception before any test is able to run.

2020-07-27  Amit Bawer  <abawer@redhat.com>

	lvm: Avoid logging Unreadable items for reload when there are none
	We have redundant WARN logging in reload items when LVM command fails
	and no stale items in cache to update as Unreadable.

	For example:

	2020-07-27 02:42:01,077+0300 WARN  (jsonrpc/6) [storage.LVM] All 1 tries have failed:
	cmd=['/sbin/lvm', 'vgs', '--config', 'devices {  preferred_names=["^/dev/mapper/"]
	ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3
	filter=["a|^/dev/mapper/3600a09803830447a4f244c465759562f$|", "r|.*|"]
	hints="none"  obtain_device_list_from_udev=0 } global {  locking_type=1
	prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 } backup {
	retain_min=50  retain_days=0 }', '--noheadings', '--units', 'b', '--nosuffix',
	'--separator', '|', '--ignoreskippedcluster', '-o',
	'uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name',
	'c19634de-80d0-41e8-bf47-9955162303af', '91418569-ab4d-4250-b167-29dc389557ee', '05c517f9-3f82-46d5-a755-740664271168',
	'6a356163-8f2b-46bf-a1cb-115630b0cbe7', '1ae0b42a-b75a-4284-8c13-7147a75e52b5', 'e36daf99-fc33-4ba3-a34b-19b358a05c01']
	rc=5 err=['  Volume group "e36daf99-fc33-4ba3-a34b-19b358a05c01" not found.', '  Cannot process volume group e36daf99-fc33-4ba3-a34b-19b358a05c01'] (lvm:530)

	Which makes the following log print redundant:

	2020-07-27 02:42:01,077+0300 WARN  (jsonrpc/6) [storage.LVM] Marked vgs=[] as Unreadable due to reload failure (lvm:638)

	This change first checks the updated list before doing log
	print.

2020-07-27  Nir Soffer  <nsoffer@redhat.com>

	fc-scan: Limit number of workers on smaller machines
	Looking at fc-scan on an older system with 8 cores, is seems that using
	64 workers causes unwanted load for no benefit. It is also not needed
	since smaller systems are likely to have less running VMs and less LUNs.
	Limit the number of workers to 1 worker per CPU, up to 64 workers.

	Tested on a system with 64 paths to storage. Without this change fc-scan
	will try to scan all devices at once. With this change, 8 devices at the
	same time.

	--------------------------------------
	workers   total  time     rescan  time
	          real    sys     avg      max
	--------------------------------------
	  8       629     686     0.26   18.89
	 64       624     815     0.94   37.09

	Data was collected using this script:

	$ cat fc-scan-test.sh
	for i in $(seq 100); do
	    /usr/libexec/vdsm/fc-scan -v
	    udevadm settle --timeout 10
	    sleep 1
	done

	Related-To: https://bugzilla.redhat.com/1851893

	multipath: Fix bad log format
	In this change:

	commit 4a790b7d506414e14faac15d13f4d327485ac5c9
	Author: Vojtech Juranek <vjuranek@redhat.com>
	Date:   Thu Nov 7 23:22:46 2019 +0100

	    multipath: ignore scsi_id failures

	We tried to handle iscsi_id failures by logging a debug message and
	returning empty string. However the log format was wrong, using %e
	instead of %s, so this log raises this exception, logged to the system
	journal:

	daemonAdapter[563280]: --- Logging error ---
	daemonAdapter[563280]: Traceback (most recent call last):
	daemonAdapter[563280]:   File "/usr/lib64/python3.6/logging/__init__.py", line 994, in emit
	daemonAdapter[563280]:     msg = self.format(record)
	daemonAdapter[563280]:   File "/usr/lib64/python3.6/logging/__init__.py", line 840, in format
	daemonAdapter[563280]:     return fmt.format(record)
	daemonAdapter[563280]:   File "/usr/lib64/python3.6/logging/__init__.py", line 577, in format
	daemonAdapter[563280]:     record.message = record.getMessage()
	daemonAdapter[563280]:   File "/usr/lib64/python3.6/logging/__init__.py", line 338, in getMessage
	daemonAdapter[563280]:     msg = msg % self.args
	daemonAdapter[563280]: TypeError: must be real number, not Error
	daemonAdapter[563280]: Call stack:
	daemonAdapter[563280]:   File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
	daemonAdapter[563280]:     self._bootstrap_inner()
	daemonAdapter[563280]:   File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
	daemonAdapter[563280]:     self.run()
	daemonAdapter[563280]:   File "/usr/lib64/python3.6/threading.py", line 864, in run
	daemonAdapter[563280]:     self._target(*self._args, **self._kwargs)
	daemonAdapter[563280]:   File "/usr/lib/python3.6/site-packages/vdsm/common/concurrent.py", line 260, in run
	daemonAdapter[563280]:     ret = func(*args, **kwargs)
	daemonAdapter[563280]:   File "/usr/lib/python3.6/site-packages/vdsm/common/logutils.py", line 373, in _run
	daemonAdapter[563280]:     self._target.handle(record)
	daemonAdapter[563280]: Message: 'Ignoring scsi_id failure for device %s: %e'
	daemonAdapter[563280]: Arguments: ('/dev/dm-14', Error(['/usr/lib/udev/scsi_id', '--page=0x80',
	'--whitelisted', '--export', '--replace-whitespace', '--device=/dev/dm-14'], 1, b'', b''))

	I think this is just logging issue, failing to format the log message in
	the logfile thread, so this is just unwanted noise, but I'm not sure.

	I'm not sure why the exception is logged the journal, maybe this is a
	fallback in logging, when formatting an exception fails.

	Bug-Url: https://bugzilla.redhat.com/1860716

2020-07-27  Milan Zamazal  <mzamazal@redhat.com>

	spec: Depend on libvirt >= 6.0.0-17 on CentOS
	Bug-Url: https://bugzilla.redhat.com/1840414

2020-07-24  Eyal Shenitzky  <eshenitz@redhat.com>

	backup_test.py: validate backup XML in backupBegin
	Add a verification for the backup XML to FakeDomainAdapter.backupBegin().

	It allows testing the backup XML in a real flow behavior, so now the
	specific test for backup XML is redundant.

2020-07-23  Nir Soffer  <nsoffer@redhat.com>

	qemuimg: Put all options before file names
	Specify all command line options before the file names for easier
	debugging. Now it is possible to look up all the options quickly.

	Here is an example command (copied from the tests):

	    /usr/bin/taskset --cpu-list 0-7 /usr/bin/nice -n 19 /usr/bin/ionice
	    -c 3 /usr/bin/qemu-img convert -p -t none -T none -f qcow2 -O raw
	    /var/tmp/tmpkl2q92rb/src /var/tmp/tmpkl2q92rb/dst (cwd None)

	Bug-Url: https://bugzilla.redhat.com/1850267

	qemuimg: Do not use creation options when skipping image creation
	This is not well documented but the -o option in "qemu-img convert" is
	for image creation options. When we skip image creation with -n the
	options are ignored.

	Since qemu-img 4.2 using -n together with -o logs this warning:

	    qemu-img: warning: -o has no effect when skipping image creation
	    qemu-img: warning: This will become an error in future QEMU versions.

	Let's eliminate this before this becomes an error.

	Bug-Url: https://bugzilla.redhat.com/1850267

	tests: Test convert to qcow2 compat=0.10
	When converting to the old qcow2 compat=0.10 format without a backing
	file and with create=False, qemu-img allocates the entire image. Add
	tests verifying that using create=True avoids this issue.

	There are 2 use cases:
	- Convert base volume - we had a test for this, but we tested only
	  compat=1.1. Add parameter for testing also compat=0.10.
	- Collapse chain to target qcow2. Add new test for this use case testing
	  both compat=1.1 and compat=0.10.

	Bug-Url: https://bugzilla.redhat.com/1850267
	Related-To: https://bugzilla.redhat.com/1858632

	volume: Be more careful with create=False
	Recently we started to use create=False when converting images to block
	storage. This was needed as a workaround for qemu-img bug that is now
	fixed.

	Then we found that qemu-img preallocation is inefficient and cause
	trouble with legacy NFS storage, and now we preallocate images with out
	fallocate helper.

	These changes revealed the fact that we always create the target image
	when running qemuimg.convert(). This does not make sense for our use
	case, so we switch to create=False for all cases.

	Unfortunately, this does not work. We have 2 cases that require
	create=True:

	1. Raw sparse images when the file system does not support punching
	   holes. When qemu-img convert is trying to punch holes in unallocated
	   areas it falls back to writing zeroes, which is very slow, and fully
	   allocates sparse images.

	2. qcow2 with compat=0.10 when volume does not have a parent. Since this
	   older qcow2 format does not support zero clusters, qemu-img fall back
	   to writing zeroes which is very slow and allocates the entire image.

	When qemu-img convert creates a new image, it knows that the image is
	zeroed so it can skip unallocated areas.

	Here are few examples showing the problem cases:

	Coping sparse image to NFS 3

	    $ truncate -s 10g /var/tmp/src.raw
	    $ truncate -s 10g dst.raw

	    $ time qemu-img convert -f raw -O raw -t none -T none -n /var/tmp/src.raw dst.raw

	    real    0m50.684s
	    user    0m0.034s
	    sys     0m0.711s

	    $ du -sh dst.raw
	    10G     dst.raw

	The image became fully allocated and the operation was very slow. If
	we use create=True:

	    $ time qemu-img convert -f raw -O raw -t none -T none /var/tmp/src.raw dst.raw

	    real    0m0.222s
	    user    0m0.005s
	    sys     0m0.003s

	    $ du -sh dst.raw
	    4.0K    dst.raw

	More correct and 250 times faster.

	Copying image to qcow2 compat=0.10:

	    $ qemu-img create -f qcow2 -o compat=0.10 dst.qcow2 10g

	    $ time qemu-img convert -f raw -O qcow2 -t none -T none -n /var/tmp/src.raw dst.qcow2

	    real    0m58.734s
	    user    0m0.049s
	    sys     0m0.673s

	    # du -sh dst.qcow2
	    11G     dst.qcow2

	Again the image was fully allocated, slowly. If we use create=True:

	    $ time qemu-img convert -f raw -O qcow2 -t none -T none /var/tmp/src.raw dst.qcow2

	    real    0m0.224s
	    user    0m0.003s
	    sys     0m0.006s

	    $ du -sh dst.qcow2
	    196K    dst.qcow2

	The creation logic is needed in the 3 callers of qemuimg.convert(), and
	is mostly about the volume properties. Add Volume.requires_create()
	method returning True if using the volume as target image in
	qemuimg.convert() requires create=True.

	The issue with qcow2 compat=0.10 format is fixed upstream. Once
	qemu-5.1.0 is available we can remove this check and handle only raw
	sparse images.

	Bug-Url: https://bugzilla.redhat.com/1850267
	Related-To: https://bugzilla.redhat.com/1858632

	tests: Add failing tests for copy_data
	When converting to qcow2 compat=0.10 qemu allocates the entire image.
	The current tests missed this since they checked only some data instead
	of entire image.

	Using qemu-img compare -s, we verify both content and allocation of the
	destination image. This expose the failure when converting to older
	domains using compat=0.10.

	To mark only 2 permutations as failure we need to convert the test to
	pytest and move it out of the class since pytest parameters do not work
	in unittest.TestCase subclass.

	The test for collapsing image was broken, always testing qcow2
	destination image. The error was discovered by comparing with
	qemuimg.comare() with specifying the image format.

	When testing collapse, strict mode does not work, but we can compare
	source and destination actual size to detect wrong allocation.

	Finally to get consistent results with qemu-img compare strict mode, use
	userstorage. Since we tests here also block storage and it is not
	compatible with 4k storage, use storage with sector size of 512 bytes.

	Bug-Url: https://bugzilla.redhat.com/1850267

2020-07-23  Tomáš Golembiovský  <tgolembi@redhat.com>

	qga: add qemu-ga context to calls from vm.py

	qga: add context for setting libvirt timeout
	With libvirt 5.10 we get the ability to configure timeout for qemu-ga
	commands. This is important for stats collection because one rogue guest
	can block collection for other VMs. The complication with the API is
	that it is not per-command but sets the timeout globaly for all commands
	of the VM. But for the commands not related to stats collection we still
	want to rely on the defaults.

	Here we introduce a context for setting the timeout on commands. It
	internally serializes the request by using a lock mechanism.
	The timeout is only valid for the duration of the context and is reset
	to `default` when context ends.

	qga: remove old code querying qemu-ga directly
	Libvirt requirement has been bumped so we don't need to keep the old
	interface anymore.

	automation: update libvirt packages
	update libvirt packages; virt test suite needs at least version 6.0.0

2020-07-23  Ales Musil  <amusil@redhat.com>

	net, tests: Remove create_tap function
	Remove create_tap from netfunctestlib as it is not used anymore.

	net, tests, nmstate: Replace tap usage with dummy
	Nmstate does not generally support usage of tap devices
	which in 0.3 leads to verification failures. Use dummy
	that is hidden to refresh caps instead of tap in tests.

	net, tests: Clearly separate test and setup in net_basic_test

	net, tests: Clearly separate test and setup in bridge_test

2020-07-22  Milan Zamazal  <mzamazal@redhat.com>

	machinetype: Add spec_ctrl feature also to new CPUs without -IBRS
	New CPU models are available that have spec_ctrl feature by default.
	With more of those CPUs coming, we need some way mechanism to detect
	the feature without checking -IBRS suffix in the model name, which is
	not present in new models.

	Let's use heuristic where we look for -IBRS suffix in any of the
	compatible CPU models.

	Bug-Url: https://bugzilla.redhat.com/1854922

2020-07-22  Ales Musil  <amusil@redhat.com>

	net, tests: Enable IPv6 inside container
	Container that runs on Travis has disabled ipv6 [0].
	A workaround to enable it is to write into sysctl
	before start of any test.

	This is helpful for nmstate because they are using
	Travis as their CI and run functional tests regularly.

	[0] https://github.com/travis-ci/travis-ci/issues/8891

2020-07-21  Amit Bawer  <abawer@redhat.com>

	taskManager: Use internal method for already found task items
	Minor enhancements for using the internal info or status getters
	for an already found item instead of having to fetch it again from
	tasks dict using the same taskID key.

	resourceManager: iterate resources values without six
	six.iteritems is not required anymore, so we can drop it now.

	monitor: Add monitors dictionary lock
	Iterating over the monitors dict while other methods can
	remove or add monitors to it can result in a RuntimeError
	for python3.

	The resolution in this case is to add a class lock for the
	monitors dict to prevent its modifications from one call
	while it is iterated from another.

	For longer operation like shutdown we add shutdown flag
	to indicate for stop or start monitoring methods that it is
	taking place and in such case they would abort by raising
	ShuttingDownError. That would prevent any monitors dictionary
	modifications while shutdown is in progress outside the lock,
	allowing any other read-only dict operations to take place.

2020-07-20  Steven Rosenberg  <srosenbe@redhat.com>

	kvm: Avoid sparse imports to block storage
	This fix will check if the destination storage
	domain is block storage. The Vdsm sparse download
	implementation is not supporting block storage.

	Previously, the functionality was being blocked
	within the engine.

	Bug-Url: https://bugzilla.redhat.com/1663135

2020-07-20  Amit Bawer  <abawer@redhat.com>

	misc: iterate over dict keys list while removing its items
	Avoid hitting RuntimeError for python3 when removing items
	from a dict while iterating its items view.

	taskManager: Use the lock for tasks dict
	Use the class lock to protect the managed tasks dict
	from concurrent operations modifiying and iterating over
	the same dict.

2020-07-20  Nir Soffer  <nsoffer@redhat.com>

	travis: Remove Fedora 30 build
	Fedora 30 is EOL, there is no point to run the tests with it. We need to
	add a Fedora 32 image but I'm not sure we have all the dependencies.

	docker: Add missing packages for lint target
	Add packages required for the lint target, previously available only in
	Fedora 30 used for linting. This will allow using CentOS 8 for linting.

2020-07-19  Nir Soffer  <nsoffer@redhat.com>

	copy_data: Log copy time
	We want to log the time for long storage operations. This was already
	done for copying volumes in image.py but for some reason missing in
	copy_data.

	image: Log copy volume timing in INFO logs
	Similar to other operations like preallocating volumes, we want to know
	how much time long operation took.

2020-07-17  Tomáš Golembiovský  <tgolembi@redhat.com>

	spec: simplify libvirt requirements and bump version
	We need to bump libvirt version requirement to at least 5.10 for access
	to API for setting timeout on QEMU-GA commands. There is already 6.0.0
	in CentOS and RHEL. On Fedora there is 6.1.0 in FC32.

2020-07-17  Milan Zamazal  <mzamazal@redhat.com>

	virt: Prevent migration monitoring race and traceback
	Migration monitoring thread checks for end of migration threading
	event and then retrieves the migration job stats.  There is a timing
	issue there: If the migration finishes and the source VM is destroyed
	just before the stats are retrieved, the job stats retrieval will
	raise an exception.  The exception is harmless, it just causes the
	monitoring thread to finish, which should happen to it anyway.  But it
	logs the error and traceback in Vdsm log.

	This patch prevents the error and traceback logging by handling the
	exception.

2020-07-17  Tomáš Golembiovský  <tgolembi@redhat.com>

	qga: add partition stripping for libvirt backend
	This was originally added to poller in 66dd1d84 but was missed later
	when adding the libvirt backend. What was returned from qemu-ga was
	inconsistent with what we did return with ovirt-guest agent. Instead of
	mapping disk devices to serial numbers we in fact mapped a partition.
	E.g. /dev/sda1 instead of /dev/sda.

	The current behavior did not make much sense anyway. Suppose there were
	two partitions with filesystem on the disk /dev/sda1 and /dev/sda2 then
	either of those could get associated with the disk serial number.

	Bug-Url: https://bugzilla.redhat.com/1793290

	automation: use CentOS repo for advanced virtualization

2020-07-16  Andrej Cernek  <acernek@redhat.com>

	net, tests: fix getLink test
	isBRIDGE method of the Link has been mistakenly used as if it was
	property. This caused the assert to always pass, no matter the state
	of the link.

2020-07-16  Bell Levin  <blevin@redhat.com>

	net: Add speed read of bond mode 3
	Bond mode 3 speed was not read up until now, and was skipped due
	to no active slave, since mode 3 (broadcast) does not have an
	active slave, returning speed 0.

	In this patch adding the display of the bond speed by picking
	the lowest speed from all slaves.

	Bug-Url: https://bugzilla.redhat.com/1790747

2020-07-15  Bell Levin  <blevin@redhat.com>

	net: Change identical name of module and function
	The name of the module "speed.py" and the name of the function
	inside it is "speed()". This creates a problem when trying to
	mock anything inside of the speed function.

	Changing the name of the module fixes the issue since mock can
	differentiate between the two.

2020-07-14  Nir Soffer  <nsoffer@redhat.com>

	fileVolume: Unify preallocation logging
	The stopwatch log when extending preallocated volume was logged only in
	DEBUG level and used the default logger. Use INFO level and FileVolume
	logger, similar to the way we log preallocation of entire volume.

	https://bugzilla.redhat.com/1850267

	fileVolume: Improve logging when creating raw volumes
	Log "Request to create RAW volume ..." before creating the volume, for
	consistency with other logs.

	Log the stopwatch messages for creating the image and preallocating the
	volume in INFO level and using FileVolume logger so these logs are
	available in the default log level. This will make it easier to diagnose
	performance issues with preallocated volumes.

	https://bugzilla.redhat.com/1850267

	fileSD: Do not use qemu-img preallocation=falloc
	We used to defer volume preallocation when creating a raw preallocated
	volume by supporting initial_size=1. Then when copying the volume, we
	used:

	    qemu-img convert ... -o preallocation=falloc src dst

	Turns out that this does not work well for NFS < 4.2, and can lead to
	sanlock timeouts, storage monitoring failures, and VMs becoming
	non-responsive. This is also slower than preallocating volumes using dd
	that was done in the past.

	To fix this, we drop the idea of deferring preallocation. The initial
	size sent by engine is ignored now when creating raw preallocated
	volume, and we always preallocate the volume at the creation step.

	Creating a raw preallocated volume is done in 2 steps:

	1. Create sparse image using qemu-img create, since qemu-img allocates
	   the first block by writing zeroes.

	2. Allocate the rest of the image using fallocate helper, which was
	   changed to use efficient and safe method on any NFS version.

	When we copy images, we always use the -n option to use the existing
	volume:

	    qemu-img convert ... -n src dst

	This keeps the target volume preallocated. This was already used for
	block storage, and now used also for file storage.

	When using create=False dstQcow2Compat and backingFormat are ignored, so
	these options were removed.

	https://bugzilla.redhat.com/1850267

	tests: storagetestlib: Fix creation of file volumes
	When creating file volumes we always created sparse volume which does
	not match the real code, using qemu-img create in all cases.

	We had a wrong test checking for zero allocation for a new empty image,
	while in reality an empty image always allocates one file system block
	when using qemu-img create.

	Bug-Url: https://bugzilla.redhat.com/1850267

	helpers: fallocate: Support human size
	When specifying offset and size, support t|g|m|k suffix like the
	qemu-img and fallocate commands. This makes it easier to test the
	tool manually.

	Bug-Url: https://bugzilla.redhat.com/1850267

	helpers: fallocate: Replace posix_falloate
	Turns out that posix_fallocate() is very slow and when using in qemu-img
	using OFD locking, creating a preallocation volume in qemu-img create or
	convert causes long delays when accessing storage, that breaks sanlock
	delta lease renewals, storage monitoring, and vm monitoring.

	Both the fallocate helper and qemu-img create and convert use
	posix_fallocate(). We can fix this issue by replacing it with a better
	implementation.

	Replace posix_fallocate() in the fallocate helper with fallocate(). On
	NFS 4.2, GlusterFS, XFS and ext4, fallocate() is practically free,
	allocating file space without any I/O.

	If fallocate() is not available or not supported by the underlying file
	system, fallback to writing zeroes using direct I/O. Using direct I/O
	should avoid I/O delays in other programs, and is faster.

	Testing with NFS 3 storage show that the fallocate helper is now 2.5
	times faster, use 12 times less cpu time, and issue 62 times less nfs
	calls compared with qemu-img create.

	The results are promising, but more testing is needed with real NFS
	server and real network.

	command        real(s)   user(s)   sys(s)   nfs calls
	-----------------------------------------------------
	fallocate[1]    31.92      0.05     0.73        42257
	qemu-img[2]     81.30      1.25     8.76      2637797

	[1] fallocate command:

	$ for in in $(seq 10); do \
	    sleep 1; \
	    time nice -n 19 ionice -c 3 ~/fallocate 10g test.img; \
	    rm -f test.img; \
	    sync; \
	done

	[2] qemu-img create command:

	$ for in in $(seq 10); do \
	    sleep 1; \
	    time nice -n 19 ionice -c 3 qemu-img create -f raw -o preallocation=falloc test.img 10g; \
	    rm -f test.img; \
	    sync; \
	done

	Bug-Url: https://bugzilla.redhat.com/1850267

	network: nmstate: Package vdsm/network/nmstate
	Since nmstate was moved to new package in:

	commit 15f0bd04c3ef8daf32fb75bdf51839aef241b185
	Author: Ales Musil <amusil@redhat.com>
	Date:   Thu Jun 25 11:59:30 2020 +0200

	    net, nmstate: Move nmstate to its own module

	supervdsm fail to start with this error:

	    ImportError: cannot import name 'nmstate'

	Fix by adding the necessary automake and autoconf magic so the new code
	is packaged and installed on the host.

2020-07-13  Nir Soffer  <nsoffer@redhat.com>

	static: imageio: Minor text issues in imageio configuration
	Add missing "is" and fix few typos.

2020-07-13  Amit Bawer  <abawer@redhat.com>

	lvm: Use dict.copy() instead of dict() for copies
	copy() method should be atomic where using dict() for copy
	is not safe when we have different flows modifying the same
	dict while it is iterated for being copied.

	Bug-Url: https://bugzilla.redhat.com/1856065

2020-07-13  Ales Musil  <amusil@redhat.com>

	net, nmstate: Move LinuxBridgeNetwork to its own module
	LinuxBridgeNetwork is one type of network that
	is allowed to be configured in vdsm. For this reason
	it is moved to its own module to have the clear
	separation.

	net, nmstate: Move IP address state to its own module
	IpAddress class is switch agnostic. Move it to its
	own module.

	net, nmstate: Move Routes to its own module
	Routes class should be switch agnostic and it can be
	moved to its own module.

	net, nmstate: Move useful functions to util module
	Move functions that are universal enough to be used
	by multiple modules to util.

	Move NetworkConfig class to util.

	net, nmstate: Move Bond to its own module
	Bond class is isolated enough to the point that we can
	move it to its own module for clarity and easier
	navigation.

	net, nmstate: Remove import of compat libs
	Vdsm does not support py2 since 4.4 thus we don't
	require import of compat features as well as usage
	of six.

	Replace six usage with py3 syntax.

	net, nmstate: Use common nmstate schema import
	Because import of schema is done in multiple places use
	common module.

	net, nmstate: Move nmstate to its own module
	Thought out the development nmstate grew to a point
	where it started to became messy and harder to navigate.

	Prepare for module split by moving nmstate code into
	its own submodule.

	net, nmstate, test: Reuse RunningConfig and current_state mocks
	Move current_state_mock to conftest instead of
	definition in every file.

	Move rconfig_mock to conftest instead of definition
	for every function that needs it. As bonus it can't
	happen that polluted env could brake the test.

	current_state_mock is used by every test for nmstate.
	Move it to conftest instead of definition in every file.

	net, nmstate, test: Move linux bridge nmstate tests to module
	Nmstate tests grown to a point where it deserves its
	own module hierarchy rather then isolated file.
	This work was started by separation of testlib and vlan
	testing.

	Now with preparation for OvS it's good time to move
	linux bridge tests and mark them accordingly.

	net, nmstate, test: Move nmstate bond tests to seperate module
	As Bond state generation is isolated from Network
	there is no point keeping it single huge testing file.

	This can also help to identify some missing pieces
	that might not be tested yet.

2020-07-12  Nir Soffer  <nsoffer@redhat.com>

	spec,docker,automation: Require lsof
	We require lsof now, but it was not added to the spec, automation and
	the docker files. We were lucky that the package is available
	everywhere except the CentOS 8 container image.

	Add the requirements so changes in other packages requiring lsof will
	not break vdsm.

	This change completes:

	commit 31c2202fa78f159b61e84154a4c7a516b0f47bba
	Author: Amit Bawer <abawer@redhat.com>
	Date:   Thu Jul 2 16:06:23 2020 +0300

	    lsof: Add lsof supervdsm module

	Bug-Url: https://bugzilla.redhat.com/1854050

	travis: Upgrade distribution to bionic
	Upgrading to bionic fixes random failure with creating loop device with
	sector size of 4k.

	With this patch, all loop devices are created successfully[1]:

	python3 tests/storage/userstorage.py setup
	[userstorage] INFO    Creating backing file /var/tmp/vdsm-storage/backing.file-512
	[userstorage] INFO    Creating loop device /var/tmp/vdsm-storage/loop.file-512
	[userstorage] INFO    Creating filesystem /var/tmp/vdsm-storage/mount.file-512
	[userstorage] INFO    Creating file /var/tmp/vdsm-storage/mount.file-512/file
	[userstorage] INFO    Creating backing file /var/tmp/vdsm-storage/backing.file-4k
	[userstorage] INFO    Creating loop device /var/tmp/vdsm-storage/loop.file-4k
	[userstorage] INFO    Creating filesystem /var/tmp/vdsm-storage/mount.file-4k
	[userstorage] INFO    Creating file /var/tmp/vdsm-storage/mount.file-4k/file
	[userstorage] INFO    Creating backing file /var/tmp/vdsm-storage/backing.mount-512
	[userstorage] INFO    Creating loop device /var/tmp/vdsm-storage/loop.mount-512
	[userstorage] INFO    Creating filesystem /var/tmp/vdsm-storage/mount.mount-512
	[userstorage] INFO    Creating backing file /var/tmp/vdsm-storage/backing.mount-4k
	[userstorage] INFO    Creating loop device /var/tmp/vdsm-storage/loop.mount-4k
	[userstorage] INFO    Creating filesystem /var/tmp/vdsm-storage/mount.mount-4k

	Without this patch, system fail to create loop device with 4k sector
	sizei[2]:

	python3 tests/storage/userstorage.py setup
	[userstorage] INFO    Creating backing file /var/tmp/vdsm-storage/backing.file-512
	[userstorage] INFO    Creating loop device /var/tmp/vdsm-storage/loop.file-512
	[userstorage] INFO    Creating filesystem /var/tmp/vdsm-storage/mount.file-512
	[userstorage] INFO    Creating file /var/tmp/vdsm-storage/mount.file-512/file
	[userstorage] INFO    Creating backing file /var/tmp/vdsm-storage/backing.file-4k
	[userstorage] INFO    Creating loop device /var/tmp/vdsm-storage/loop.file-4k
	losetup: /dev/loop1: set logical block size failed: Resource temporarily unavailable
	Traceback (most recent call last):
	  File "tests/storage/userstorage.py", line 296, in <module>
	    main()
	  File "tests/storage/userstorage.py", line 275, in main
	    setup(args)
	  File "tests/storage/userstorage.py", line 284, in setup
	    p.setup()
	  File "tests/storage/userstorage.py", line 202, in setup
	    self._mount.setup()
	  File "tests/storage/userstorage.py", line 142, in setup
	    self._loop.setup()
	  File "tests/storage/userstorage.py", line 81, in setup
	    device = self._create_loop_device()
	  File "tests/storage/userstorage.py", line 116, in _create_loop_device
	    out = subprocess.check_output(cmd)
	  File "/usr/lib64/python3.7/subprocess.py", line 411, in check_output
	    **kwargs).stdout
	  File "/usr/lib64/python3.7/subprocess.py", line 512, in run
	    output=stdout, stderr=stderr)
	subprocess.CalledProcessError: Command '['sudo', 'losetup', '-f',
	'/var/tmp/vdsm-storage/backing.file-4k', '--show', '--sector-size',
	'4096']' returned non-zero exit status 1.

	Looks like the newer kernel in bionic (5.3.0) works better for us.

	[1] https://travis-ci.org/github/nirs/vdsm/jobs/707361262
	[2] https://travis-ci.org/github/nirs/vdsm/jobs/707420014

2020-07-12  Amit Bawer  <abawer@redhat.com>

	lvm: Avoid iterating pvs dict without locking
	removeVG performs tuple comprehension outside of LVMCache
	over pvs dict to find PVs of the VG to be invalidated
	after the VG was removed from lvm.

	This can be unsafe since pvs dict is common to all VGs in cache
	and if another VG operation modifies the pvs dict during
	tuple comprehension we can get a RuntimeError.

	Remove the tuple comprehension and use the lock protected
	_invalidatevgpvs method for invalidating the removed VG's pvs.

	Bug-Url: https://bugzilla.redhat.com/1856065

	lvm: Remove vgs from vgs dict while under lock
	Eliminate unsafe condition for modifying the vgs dict
	outside of LVMCache by moving vg removals to a private
	lock protected method.

	Bug-Url: https://bugzilla.redhat.com/1856065

	lvm: Remove lvs from lvs dict while under lock
	Removing items directly from the lvs dict outside LVMCache
	causes a RuntimeError for modifiying a dictionary while
	another code iterates the dict:

	  File "/usr/lib/python3.6/site-packages/vdsm/storage/lvm.py", line 734, in _reloadlvs
	    staleLVs = [lvName for v, lvName in self._lvs
	  File "/usr/lib/python3.6/site-packages/vdsm/storage/lvm.py", line 734, in <listcomp>
	    staleLVs = [lvName for v, lvName in self._lvs
	RuntimeError: dictionary changed size during iteration

	By moving the lvs removal operation into a dedicated
	private method in LVMCache protected by the same lock
	used for other operations accessing the same dict we
	eliminate this unsafe condition.

	Bug-Url: https://bugzilla.redhat.com/1856065

2020-07-12  Nir Soffer  <nsoffer@redhat.com>

	travis: Create loop devices before build
	Looks like we don't have enough loop devices in the container. We had
	the same issue in oVirt CI, which was solved by creating more devices
	nodes in the chroot.

2020-07-09  Tomáš Golembiovský  <tgolembi@redhat.com>

	virt: do not error when ovirt-guest-agent channel is not configured
	Do not invent path for a channel when it is not configured in domain
	XML and don't try to open such non-existing crafted path. Also handle
	cleanup properly in case no channel is configured. Finally fix the test
	for guest agent socket to actually test something meaningful.

	Bug-Url: https://bugzilla.redhat.com/1779527

2020-07-08  Nir Soffer  <nsoffer@redhat.com>

	contrib: create-target: Disable write_back
	By default fileio backstore is using write_back=true. This may improve
	performance by using file system cache, but it increases the chance for
	data loss. I also experienced stability issues with targets using
	write_back=true. Lets disable write_back by default.

	Related-To: https://bugzilla.redhat.com/1851023

2020-07-06  Amit Bawer  <abawer@redhat.com>

	lvm: Add processes info to LV deactivations error
	Add information about processes currently using the LV to the error
	information which are preventing it from being deactivated.

	Example trace in case of deactivation failure with processes using the
	LV:

	vdsm.storage.exception.CannotDeactivateLogicalVolume:
	Cannot deactivate Logical Volume:
	'error=General Storage Exception: ("5 [] [\'  Logical volume 564d342f-caa1-4b9e-8db1-b4708e17b13c/a682429d-d4a2-46a3-8c8f-458303ac0fde in use.\']\\n564d342f-caa1-4b9e-8db1-b4708e17b13c/[\'a682429d-d4a2-46a3-8c8f-458303ac0fde\']",), users={\'/dev/564d342f-caa1-4b9e-8db1-b4708e17b13c/a682429d-d4a2-46a3-8c8f-458303ac0fde\': [{\'pid\': 77707, \'command\': \'pytest\', \'user\': \'root\', \'fd\': 8}]}'

	Bug-Url: https://bugzilla.redhat.com/1854050

	lsof: Add lsof supervdsm module
	This will allow to gather process information using a given path.
	lsof.proc_info(path) gets a path and produces iterable which
	in turn has a record entry per each process using the given path:

	{'command': 'pytest', 'fd': 8, 'pid': 65239, 'user': 'root'}

	Bug-Url: https://bugzilla.redhat.com/1854050

2020-07-05  Vojtech Juranek  <vjuranek@redhat.com>

	storage: deactive VG on storage teardown
	Currently we deactivate unused LVs upon block SD teardown. This actually
	doesn't do what we want - deactivate whole SD. It just deactivates
	unused LV. Replace is with deactivation of whole VG. In case of failure
	log the error (it's logged by the caller).

	Bug-Url: https://bugzilla.redhat.com/1850458

	lvm: don't check VG existence when deactivating it
	Don't check if VG exists when deactivating it. This check may fail (e.g.
	when storage is not available) as we will end up with VG which is still
	active. If deactivation of the VG fails, the caller has to handle this
	error.

	storage: improve error logging for SD teardown
	Add exception to the error log when tear down of SD fails.

	storage: improve logging when removing device mappings
	Add info log for every attempt to remove device mapping with
	dmsetup remove. If this removal fails, don't just ignore the failure
	in lvm.removeVgMapping(), but log error message.

	lvm.removeVgMapping() is now unused, but may use it in the future
	(if not, it will be removed).

2020-07-03  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.22

2020-07-03  Kaustav Majumder  <kmajumde@redhat.com>

	gluster: Modified check for ipv4 fqdn instead of ipv6
	A change was added in patch: https://gerrit.ovirt.org/109360
	where ipv6 fqdn is resolved with
	socket.getaddrinfo(address, None, family=socket.AF_INET6) and '--inet6'
	However it has been noticed that with hosts having dual nics(both ipv4)
	ipv6 was resolved via the fqdn.
	This patch should fix the above by checking only ipv4. If it is not
	resolved ipv6 is regarded and corresponding `--inet6` is added to the
	gluster command.

	Bug-Url: https://bugzilla.redhat.com/1841076

2020-06-30  Sandro Bonazzola  <sbonazzo@redhat.com>

	automation: add CodeQL analysis
	Adding security scanning with CodeQL
	See https://help.github.com/en/github/finding-security-vulnerabilities-and-errors-in-your-code/configuring-code-scanning

2020-06-30  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.21

2020-06-30  Tomáš Golembiovský  <tgolembi@redhat.com>

	qga: fix access to ipaddress properties
	Those are properties, not functions.

2020-06-30  Nir Soffer  <nsoffer@redhat.com>

	lvm: Disable obtain_device_list_from_udev
	On RHEL 8.2.1 we see random failures in lvcreate and lvchange when
	running tests/storage/stress/reload.py:

	    Failed to udev_enumerate_scan_devices.
	    Volume group "bz1837199-000000000000000000000-0006" not found.
	    Cannot process volume group bz1837199-000000000000000000000-0006

	David Teigland suggested[1] to try obtain_device_list_from_udev=0 to
	mitigate this. Testing shows that this eliminates the errors, and
	improves performance slightly[2].

	This is the second time we disable obtain_device_list_from_udev. We did
	the same in RHEL 6:

	commit 23ce1d87fa98f35d83005e2958ff4065b78ff9d8
	Author: Nir Soffer <nsoffer@redhat.com>
	Date:   Mon Nov 4 22:08:45 2013 +0200

	    lvm: Do not use udev cache for obtaining device list

	We removed the setting in RHEL 7:

	commit e820cc58d0f89480ed4ce701c492501539a0b7ba
	Author: Fred Rolland <frolland@redhat.com>
	Date:   Sun Oct 18 14:51:31 2015 +0300

	    lvm: Use udev cache for obtaining device list

	We could not reproduce any error in RHEL 7.8, so looks like this broke
	again in RHEL 8.2.

	[1] https://bugzilla.redhat.com/1812801#c3
	[2] https://gerrit.ovirt.org/c/109343/

	Bug-Url: https://bugzilla.redhat.com/1842053

2020-06-30  Andrej Cernek  <acernek@redhat.com>

	net, tests: move second networksetuphook test to pytest
	Hook tests are one of the last net functional tests that has not been
	moved to pytest yet. The second one is testing after_network_setup hook
	by creating cookiefile and comparing it with the created network.

	As it is last test that is being converted, the old test file is
	removed as well.

	net, tests: move first networksetuphook test to pytest
	Hook tests are one of the last net functional tests that has not been
	moved to pytest yet. The first one is testing before_network_setup hook
	by changing new network's bridged attribute to True.

2020-06-29  Eyal Shenitzky  <eshenitz@redhat.com>

	backup: add new dump_checkpoint() API call
	New VM.dump_checkpoint() API call will be used to fetch the checkpoint
	XML description for a given checkpoint ID.

	The checkpoint XML description is needed for the engine to update the
	checkpoint XML after checkpoint removal. The engine allows removing
	only the root checkpoint in the chain, removing the root checkpoint will
	affect on the child checkpoint XML since this checkpoint has no parent
	now.

	Request for VM.dump_checkpoint -
	{
	  'vmID': vm-id,
	  'checkpoint_id': checkpoint-id
	}

	Response -
	{
	  'result': {
	    'checkpoint': <checkpoint-xml>
	  }
	}

2020-06-29  Andrej Cernek  <acernek@redhat.com>

	net, tests: move testIpLinkWrapper to integration tests

2020-06-26  Nir Soffer  <nsoffer@redhat.com>

	docker: Remove specific version for ovirt-imageio-common
	ovirt-imageio-common 2.0.6 is not available now in the ovirt repos and
	there is not need to require this specific version when latest release
	is 2.0.8.

	We fixed this issue recently in automation, but forgot to update the
	docker files.

2020-06-26  Amit Bawer  <abawer@redhat.com>

	tool: Add return codes docstring for config_lvm_filter
	Help message example:

	$ vdsm-tool -h
	...
	 config-lvm-filter
	usage:
	 static/usr/bin/vdsm-tool [options] config-lvm-filter
	    Configure LVM filter allowing LVM to access only the local storage
	    needed by the hypervisor, but not shared storage owned by Vdsm.

	    Return codes:
	        0 - Successful completion.
	        1 - Exception caught during operation.
	        2 - Wrong arguments.
	        3 - LVM filter configuration was found to be required but could not be
	            completed since there is already another filter configured on the
	            host.
	        4 - User has chosen not to allow LVM filter reconfiguration, although
	            found as required.

	Bug-Url: https://bugzilla.redhat.com/1522926

	tool: Fix typo in config_lvm_filter printout
	Bug-Url: https://bugzilla.redhat.com/1522926

	tool: Add return codes to config_lvm_filter when cannot configure
	Provide indication to caller about resulting execution status
	of lvm filter configuration.

	By default, vdsm-tool config-lvm-filter would return 0 on successful
	completion. 1 is returned when exception occurs during execution and
	2 for error in parsing wrong arguments.

	This patch adds return code 3 to indicate the tool has detected that
	lvm filter has to be reconfigured but refrained from doing so since
	there is already another lvm filter configured on the host.

	Return code 4 is added to indicate that the tool has detected that
	lvm filter has to be reconfigured but user aborted the operation by
	selecting "NO" in confirmation input (when executed without the
	--assume-yes|-y option).

	Bug-Url: https://bugzilla.redhat.com/1522926

2020-06-25  Milan Zamazal  <mzamazal@redhat.com>

	hostdev: Omit block_path when it's empty
	When SCSI block device block_path is unavailable, None is set as its
	value in hostdevListByCaps.  This is not correct and causes Engine
	failure in host device processing.

	When block_path is unavailable, it should be omitted from the device
	properties.  This patch fixes that.

	Bug-Url: https://bugzilla.redhat.com/1849275

2020-06-24  Tomáš Golembiovský  <tgolembi@redhat.com>

	qga: try to detect networks early during the boot
	We perform all the qemu-ga queries immediately when we notice that the
	agent is running. Often however the network is not yet up and becomes
	active only few seconds later. But since we already queried the network
	it takes 2 minutes until we attempt to query for changes again. Having
	the IP as soon as possible is useful for automation.

	What this change actually does is that during boot we check the already
	discovered interfaces and IP addresses. If we don't find any addresses
	that are not loopback or link-local, explicitly ignoring lo and docker0
	interfaces, new query for network interfaces is issued in every poller
	iteration.

	qga: move on boot checks into separate function
	There is small functional change in the edge case when the VM running
	time equals the initial interval (we did not do anything anymore
	before).

2020-06-24  Liran Rotenberg  <lrotenbe@redhat.com>

	devices: remove backingstore in restore
	Previously, when creating snapshot metadata the saved data didn't
	contain backingStore element in the disks. This is due to the changes
	made to the VIR_DOMAIN_XML_MIGRATABLE flag output in libvirt >= 6, which
	is used to produce memory dump configuration file, and since the backing
	chain is now kept stable, it is now part of the output when the flag is
	used. When we create multiple snapshots in a single VM run, the backing
	store was saved and upon loading it created a confusion.

	This patch will remove the backingStore element in disks on restore
	memory snapshot flow to prevent backing chain confusion.

	Bug-Url: https://bugzilla.redhat.com/1840609
	Bug-Url: https://bugzilla.redhat.com/1842894

2020-06-24  Sandro Bonazzola  <sbonazzo@redhat.com>

	automation: install openvswitch dependency
	install openvswitch, required by network test suite

	automation: fix openvswitch related failures
	- add oVirt 4.4 CentOS testing repo
	- drop openvswitch from buildroot setup: ovirt-openvswitch requires
	  bash to be available in pre-transaction so it can't be installed
	  during a buildroot creation.
	- add deps needed by ovirt-openvswitch so it won't fail in pre-trans.

2020-06-24  Vojtech Juranek  <vjuranek@redhat.com>

	contrib: fix help message
	Fix help message in create-target script and improve summary print out
	to include information that size is in GiB.

2020-06-23  Bell Levin  <blevin@redhat.com>

	net: Add used vlanned nic validation
	Add nic usage validation, for not allowing vlanned
	devices to share nics with bonds.

	Adding this change rather than removing it at all was
	agreed because this validation was already implemented in the
	legacy switch setup. Adding this change to vdsm validation
	rather than removing it from the legacy switch makes more
	sense in the long run, where we support both legacy switch
	and nmstate for now.

	Even if nmstate allows this kind of scenario, VDSM will not
	support it in order to reduce complexity and potential
	configuration edge cases. The behavior is also in sync with
	engine logic, where such a setup scenario is not supported.

	With this fix the validation for used nics is performed in the
	setupNetworks, regardless of the switch and backend.

2020-06-23  Amit Bawer  <abawer@redhat.com>

	lvm: Add log warning in case all cmd retries have failed
	This adds the missing log information omitted from lvm reload methods
	and will add logging information to lvm.cmd itself where it should be
	relevant.

	Bug-Url: https://bugzilla.redhat.com/1837199

	lvm: Remove wants_output check from cmd method
	Using --select for lvm reload commands introduced a problem as
	we depend on command failure to refresh a stale filter and --select
	usage would return rc=0 even if querying some items has failed.
	This used to be fixed by refreshing the devices filter for lvm command
	also when a command expected output, but no output was received.

	Since we revert from using --select for lvm reloads, this parameter
	handling is now removed.

	Bug-Url: https://bugzilla.redhat.com/1837199

	lvm: Remove --select usage from reload lvs
	Revert back to using "lvs [vg1/lv1, vg1/lv2, ...]" for reloading lvm cache.
	Using --select option for lvm command requires a scan of all VGs
	metadata before it can process only the relevant devices for the selected
	entities. In vdsm 4.3 with lvm2-2.02 this induces the risk for transient
	corruption in VG metadata for reload commands, for vdsm 4.4 with lvm2-2.03
	this is not reproducible but there is a performance overhead for using
	lvm --select commands. For sake of stability in released and undergoing
	release versions we revert back to the old way.

	This implies that reload with "lvs vg1/lv1 vg1/lv2" where lv1 is missing
	from LVM will fail the entire command and return Unreadable LV entries
	in cache for both lv1 and lv2.

	Bug-Url: https://bugzilla.redhat.com/1837199

	lvm: Remove --select usage from reload vgs
	Revert back to using "vgs [vg1, vg2, ...]" for reloading lvm cache.
	Using --select option for lvm command requires a scan of all VGs
	metadata before it can process only the relevant devices for the selected
	entities. In vdsm 4.3 with lvm2-2.02 this induces the risk for transient
	corruption in VG metadata for reload commands, for vdsm 4.4 with lvm2-2.03
	this is not reproducible but there is a performance overhead for using
	lvm --select commands. For sake of stability in released and undergoing
	release versions we revert back to the old way.

	This implies that reload with "vgs vg1 vg2" where vg1 is missing from
	LVM will cause the entire command to fail, marking vg1 as Unreadable,
	and parsing vg2 information since the code for VGs reload does not break
	upon failure and parses the valid VGs it has found.

	Bug-Url: https://bugzilla.redhat.com/1837199

	lvm: Revert change for returned vgs entries upon reload error
	This patch reverts the change:

	commit c1db4021d02475549cc088b7dd83d1a67289f0d2
	Author: Nir Soffer <nsoffer@redhat.com>
	Date:   Sun Mar 15 15:05:31 2020 +0200

	    lvm: Fix _reloadvgs() return value on errors

	The reverted change made sure that vgs reload would return the Unreadable vgs
	as well as the updated vgs to correlate with the updated LVM cache contents for
	the queried VGs. Removing --select and its soft error handling reveals
	a bug where Unreadable entries within the VG cache would never be cleared
	from it since they were added to the updatedVGs dict which is considered to hold
	the most up-to-date status of all queried VGs in the vgs reload flow.

	Bug-Url: https://bugzilla.redhat.com/1837199

	lvm_test: Add test for reload of invalidated pv
	Test cases where we reload a specificly invalidated pv,
	either stale pv or valid one.

	Bug-Url: https://bugzilla.redhat.com/1837199

	lvm: Remove --select usage from reload pvs
	Revert back to using "pvs [pv1, pv2, ...]" for reloading lvm cache.
	Using --select option for lvm command requires a scan of all VGs
	metadata before it can process only the relevant devices for the selected
	entities. In vdsm 4.3 with lvm2-2.02 this induces the risk for transient
	corruption in VG metadata for reload commands, for vdsm 4.4 with lvm2-2.03
	this is not reproducible but there is a performance overhead for using
	lvm --select commands. For sake of stability in released and undergoing
	release versions we revert back to the old way.

	This implies that reload with "pvs pv1 pv2" where pv1 is missing from
	LVM will cause the entire command to fail, marking pv2 as Unreadable
	and return those for getPV and getAllPVs methods, hence stale PV tests
	that used to pass with pvs --select for reloads are modified.

	Bug-Url: https://bugzilla.redhat.com/1837199

	logutils: Add Head class for shortening item lists in logs
	Usage example:

	   items = list(range(1000))
	   log.info("Handled items: %s", Head(items, max_items=10))

	This would log:
	   "Handled items: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...]"

	Bug-Url: https://bugzilla.redhat.com/1837199

2020-06-22  Nir Soffer  <nsoffer@redhat.com>

	vdsm-tool: config-lvm-filter: Add -y --assume-yes option
	Previously we had to run:

	    echo "yes" | vdsm-tool config-lvm-filter

	When using the tool in scripts. Now we can use:

	    vdsm-tool config-lvm-filter -y

	This skips the confirmation step and configure the host automatically.

	image: Allow single uploadImageToStream per image
	uploadImageToStream() and downloadImageFromStream() locking looks
	correct; downoadImageFromStream writing data to the image is taking an
	exclusive lock, while uploadImageToStream reading from the image is
	taking a shared lock. So we cannot have conflicts between readers and
	writers.

	However we can have multiple readers calling Image.copyFromImage(). This
	function does basically:

	    activate volume
	    try:
	        copy image data
	    finally:
	        deactivate volume

	This fails randomly, when one reader deactivate the volume right after
	the other reader activate it, just before the other reader try to read
	from the volume:

	thread 1: activate volume
	thread 1: copy image data
	thread 2: activate volume (refresh active volume)
	thread 1: deactivate volume
	thread 2: fail trying to copy image data

	Fixed by taking an exclusive lock also in uploadImageToStream, so we can
	have only one uploaded. Since this is used to upload and download small
	payload, this locking should not be an issue.

	Bug-Url: https://bugzilla.redhat.com/1694972

	http: Close connection after errors
	The http server uses keep alive connections but this does not work well
	when the internal task fails while reading the body from the client or
	sending the body to the client.

	The best way to handle errors in this case is to close the connection
	right after the error. This allows the client to detect the error early
	instead of timing out after a long delay.

	download: Add stress test for vdsm http server
	The stress test reproduces random failures when downloading images
	concurrently.

	Unfortunately this test must run with real setup, and configured
	manually with the hostname (matching engine certificate), pool id,
	domain id, and image id. This is good enough to reproduce and test the
	fix.

	The test is very slow, about 5 minutes, performing 1000 downloads, but
	it reproduces about 160 errors per run without the fix.

	Bug-Url: https://bugzilla.redhat.com/1694972

2020-06-19  Vojtech Juranek  <vjuranek@redhat.com>

	storage: remove StorageDomain.invalidate() method
	StorageDomain.invalidate() is not used anywhere - remove this dead code.

	blocksd: remove SD mappings upon SD invalidation
	Currently, when removing stale SD from a cache, we try to remove also
	all SD mappings. This is wrong as evicting the cache shouldn't impact
	running system in any way. There can be some temporal glitch by which
	the VG is not reloaded, considered as stale and all it's mappings are
	removed.

	Remove VG mapping cleanup from LVMCache._reloadvgs() and log only
	a warning when removing stale VG (as we do for PV or LV).

	Bug-Url: https://bugzilla.redhat.com/1846331

2020-06-18  Tomáš Golembiovský  <tgolembi@redhat.com>

	vm: clarify documentation for vm.freeze/vm.thaw
	The docs were missleading. We don't support any other hypervisor than
	QEMU (NB: freeze/thaw was not aded in any other hypervisor in libvirt to
	this day) and so we always need qemu-ga in the guest.

2020-06-10  Michal Skrivanek  <michal.skrivanek@redhat.com>

	require libvirt for r/o snapshots
	available in RHEL 8.2.1 only for now

	Bug-Url: https://bugzilla.redhat.com/1821627

2020-06-10  Milan Zamazal  <mzamazal@redhat.com>

	hostdev: Don't log traceback on empty ndctl output

2020-06-10  Kaustav Majumder  <kmajumde@redhat.com>

	gluster: Added check and change for ipv6 hostnames in gluster vol list
	The following gluster command fails in vdsm when trying with ipv6 hostnames

	[root@ ]# gluster --remote-host=host1-storage.lab.eng.blr.redhat.com volume info engine
	Connection failed. Please check if gluster daemon is operational.

	It has been observed that adding '--inet6' to ipv6 hostname works.

	[root@newhost ~]# gluster --remote-host=myhost.lab.eng.blr.redhat.com --inet6 volume list
	testrep

	This patch checks if a given hostname is ipv6 enabled and adds '--inet6'
	to the gluster command.

	Added GlusterFQDNToIpResolveException to the list of known exceptions
	Bug-Url: https://bugzilla.redhat.com/1841076

	gluster: Add GlusterHookCheckSumMismatchException as the subclass of GlusterHookException
	GlusterHookCheckSumMismatchException was improperly added as a child of
	GlusterException instead of GlusterHookException resulting in errors

	error:
	unbound method __init__() must be called with GlusterHookException
	instance as first argument (got GlusterHookCheckSumMismatchException instance instead)"}]

	This patch fixes this.

	Bug-Url: https://bugzilla.redhat.com/1748752

2020-06-09  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.20

2020-06-09  Tomáš Golembiovský  <tgolembi@redhat.com>

	qga: handle NotConnectedError exceptions
	The domain may not be defined in many cases and not just during start.
	It may happen that the domain is not defined on the host because
	migration is in progress. Or it may as well be already undefined because
	of shutdown.

	qga: log errors from libvirt guestInfo()
	Errors from communication with the agent should be handled gracefully.
	Here we perform only information checks and none of the errors produced
	should be considered critical. Despite our best effort the agent can
	become unresponsive at any time or start miss-behaving.

2020-06-09  Kaustav Majumder  <kmajumde@redhat.com>

	gluster: Fix errors in gfapi module
	Fixed places in gfapi.py module where certain string variables
	had to be encoded to utf-8 before being passed to
	a ctype function.
	Also fixed: replace calling of gfapi module with
	vdsm.gluster.gfapi rather than gluster.gfapi.

	Bug-Url: https://bugzilla.redhat.com/1842767

2020-06-08  Germano Veit Michel  <germano@redhat.com>

	tests: reload: Fix --read-only default value
	If store_false is used as action for an argparse
	argument, it automatically creates a default with
	the opposite value (true).

	See: https://docs.python.org/3/library/argparse.html

	So currently --read-only is set to True if the user
	omits the option.

	$ ./reload.py run 2>&1 | egrep -o 'read_only=[A-Za-z]+'
	read_only=True

	$ ./reload.py run --read-only 2>&1 | egrep -o 'read_only=[A-Za-z]+'
	read_only=False

	The LVM workers (that make changes) use the default
	from the constructor (False), but the reloaders are
	always running with read_only True from argparse.

	I think we want read_only=False by default, unless
	the user types "--read-only", at which point we
	do read_only=True.

	Related-To: https://bugzilla.redhat.com/1837199

2020-06-08  Kaustav Majumder  <kmajumde@redhat.com>

	gluster: Fixed arguments for GlusterCmdExecFailedException in thinstorage.py

	gluster: Modified test for vdoVolumeList
	Added a test where vdoVolumeList returns an empty list
	when vdo stats is not available.
	Added another test case for invalid vdo data.
	Have removed '1K-blocks available' entry from list to
	test invalid values.
	Bug-Url: https://bugzilla.redhat.com/1612152

2020-06-08  Amit Bawer  <abawer@redhat.com>

	automation: Unspecify version for ovirt-imageio-common
	This was required during transition from version 2.0.5 to 2.0.6
	but now results in an outdated lookup for CI dependency:

	 Error: Unable to find a match: ovirt-imageio-common-2.0.6

2020-06-05  Amit Bawer  <abawer@redhat.com>

	tests: reload: Add failures counter and amend failure logs
	Count the number of reloads which had no success for
	any retry.

	Related-To: https://bugzilla.redhat.com/1837199

2020-06-05  Denis Chaplygin  <dchaplyg@redhat.com>

	gluster: Ignore empty VDO output
	Sometimes VDO is not reporting device statistics with
	'not available' message. Those VDO devices must be ignored.
	Missing and invalid vdo statistics are also ignored.

	Bug-Url: https://bugzilla.redhat.com/1612152

2020-06-03  Eyal Shenitzky  <eshenitz@redhat.com>

	backup.py: add 'parent_checkpoint_id' to BackupConfig
	Currently, VDSM wrongly used from_checkpoint_id as the parent checkpoint
	when starting a backup with a checkpoint.

	This works only when backing up since the last backup, which is the normal
	flow. But if last backup has failed after starting, the user need to use
	the last successful backup checkpoint ID.

	Users may also have other reasons to use an older checkpoint ID when
	starting a backup, for example, if backup media was removed by mistake.
	Another flow is when the user need to create a full VM backup when the
	VM has a chain of existing checkpoints.

	VDSM cannot distinguish between two VM backup possibilities -
	  1. Starting an incremental VM backup.
	  2. Starting a full VM backup when having a chain of defined checkpoints.

	For that reason a new parent_checkpoint_id argument was added to
	distinguish between the flows and set the parent_id of the created
	checkpoint properly.

	The following combination of 'parent_checkpoint_id' and 'from_checkpoint_id'
	attributes can be used:

	  1. Full VM backup without a previous checkpoints chain -
	     parent_checkpoint_id=None, from_checkpoint_id=None

	  2. Full VM backup with a previous checkpoints chain -
	     parent_checkpoint_id=111-222, from_checkpoint_id=None

	  3. Incremental VM backup -
	     parent_checkpoint_id=111-222, from_checkpoint_id=333-444

	  4. Incremental VM backup without specifying parent_checkpoint_id -
	     Will raise a backupError

	Starting a backup will fail also if:

	  1. parent_checkpoint_id provided but the defined checkpoints list
	     on libvirt is empty.
	  2. parent_checkpoint_id doesn't mach the defined checkpoints list
	     leaf on libvirt.

2020-06-03  Amit Bawer  <abawer@redhat.com>

	tests: reload: Add option for setting number of retries in lvm reload
	By running ./reload.py --retries=N we add more N attempts for lvm
	reload commands of pvs/vgs/lvs in case the original invocation results
	in a failure.

	Related-To: https://bugzilla.redhat.com/1837199

2020-06-03  Milan Zamazal  <mzamazal@redhat.com>

	machinetype: Add spec_ctrl feature for -IBRS model
	If -IBRS suffix is present in the host CPU model, we can assume
	spec_ctrl CPU feature.  Let's add it in such a case.

	It should be handled in Engine, but it's easier and more robust to
	handle it in Vdsm.

	Bug-Url: https://bugzilla.redhat.com/1841030

2020-06-03  Amit Bawer  <abawer@redhat.com>

	API: Add annotations to VolumeDump optional fields
	In case of dumping volume with an invalid metadata, most fields
	would be omitted. Adding defaultvalue and description note for
	such fields in the scheme.

	Bug-Url: https://bugzilla.redhat.com/1557147

	fileSD: Dump image UUID for volumes with invalid metadata
	Keep fallback format consistent with blockSD volumes dump
	where the missing information is complemented from the LV tags
	in case of invalid metadata slot contents for a dumped volume.

	Since we cannot infer the parent UUID for a file volume with
	no metadata, we amend this field to be optional for the volume
	dump API scheme.

	Volume info example for file volume with invalid metadata:

	'a220691a-580d-4c55-8d04-4806c29466c2': {
	    'apparentsize': 2147483648,
	    'image': '75c3879b-1a81-4513-9357-782989cf2189',
	    'status': 'INVALID',
	    'truesize': 4096
	}

	Bug-Url: https://bugzilla.redhat.com/1557147

2020-06-02  Amit Bawer  <abawer@redhat.com>

	tests: reload: Use common reload function for pvs/lvs/vgs reloaders
	Refactor the common parts of the reloaders to a function which
	will also serve the addition of retries option in a patch to follow.

	Related-To: https://bugzilla.redhat.com/1837199

	tests: reload: Move reloader statistics into ReloaderStats class
	This refactoring will make updating and passing stats around
	more easy.

	Related-To: https://bugzilla.redhat.com/1837199

	tests: reload: Add option to run read-only pvs/vgs/lvs commands
	Using ./reload.py --read-only will use locking_type=4
	for reloader commands, allowing to test read-only invocation
	for pvs/vgs/lvs commands while commands with locking_type=1
	modifying the VGs metadata are executed by the workers.

	Related-To: https://bugzilla.redhat.com/1837199

2020-06-02  Nir Soffer  <nsoffer@redhat.com>

	tests: reload: Add option to control lvm verbosity
	When handling lvm command failures, LVM developers typically ask to run
	the command with -vvvv. Make it easy to provide this by adding --verbose
	option.

	When using verbose mode, LVM generates lot of output (1.5-4 MiB in this
	script), and logging it is not helpful. Errors are dumped to files named
	"command-error-nnnn.txt" with all the info about the error. This
	makes it very easy to file useful bugs.

	When using --debug mode, only the first 200 characters of stdout and
	stderr are logged when logging command completion.

	Since verbose creates huge output and we care only about the reloads
	commands, create 2 separate lvm runners - one for the workers and one
	for the reloaders. The reloader instance is using args.use_udev and
	args.verbose, while the workers use the defaults.

	Related-To: https://bugzilla.redhat.com/1837199

2020-06-02  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.19

2020-06-01  Nir Soffer  <nsoffer@redhat.com>

	tests: reload: Add option to disable --select
	Using --select 'vg_name = xxx' or '--select 'pv_name = /dev/mapper/yyy'
	LVM process metadata of all vgs, and then filter the results for
	displaying. If processing metadata of unrelated vg fails it is being
	updated by another LVM command, the reload may fail.

	Add --no-select option so we can compare robustness and timing with and
	without --select.

	When reloading lvs with --no-select, we select only the lv, which has
	the important property of not failing the command when the lv does not
	exist, which is required for the current test.

	Testing on CentOS 7.8 show that using --select is slower compared with
	explicitly specifying pvs and vgs. We perform the same amount of work in
	similar time, but number of reloads is about 40% higher.

	$ time ../reload.py --lv-count 50 --no-udev

	2020-05-31 18:00:39,387 INFO    (reload/pv) Stats: reloads=616 errors=0 error_rate=0.00% avg_time=0.558 med_time=0.489 min_time=0.135 max_time=1.274
	2020-05-31 18:00:39,561 INFO    (reload/lv) Stats: reloads=603 errors=0 error_rate=0.00% avg_time=0.570 med_time=0.494 min_time=0.132 max_time=1.438
	2020-05-31 18:00:39,666 INFO    (reload/vg) Stats: reloads=544 errors=0 error_rate=0.00% avg_time=0.632 med_time=0.556 min_time=0.154 max_time=2.055

	real 5m44.036s
	user 2m0.997s
	sys 1m52.350s

	$ time ../reload.py --lv-count 50 --no-udev --no-select

	2020-05-31 18:08:39,412 INFO    (reload/pv) Stats: reloads=698 errors=0 error_rate=0.00% avg_time=0.489 med_time=0.420 min_time=0.132 max_time=1.564
	2020-05-31 18:08:39,489 INFO    (reload/vg) Stats: reloads=869 errors=0 error_rate=0.00% avg_time=0.393 med_time=0.389 min_time=0.129 max_time=0.861
	2020-05-31 18:08:39,520 INFO    (reload/lv) Stats: reloads=918 errors=0 error_rate=0.00% avg_time=0.372 med_time=0.375 min_time=0.128 max_time=0.808

	real 5m41.847s
	user 1m43.990s
	sys 1m55.916s

	With 50 lvs per vg we don't reproduce any failures. We need to test with
	500 lvs to reproduce the errors.

	Related-To: https://bugzilla.redhat.com/1837199

	tests: reload: Add option to disable udev
	On RHEL 8.2.1 we see random failures in lvcreate and lvchange:

	    Failed to udev_enumerate_scan_devices.
	    Volume group "bz1837199-000000000000000000000-0006" not found.
	    Cannot process volume group bz1837199-000000000000000000000-0006

	David Teigland suggested[1] to try obtain_device_list_from_udev=0 to
	mitigate this.

	Add --no-udev option, setting obtain_device_list_from_udev=0 in lvm
	--config. This is useful to test if this fixes the issue on RHEL 8.2,
	and test the effect on RHEL 7.8.

	This is the second time we disable obtain_device_list_from_udev. We did
	the same in RHEL 6:

	commit 23ce1d87fa98f35d83005e2958ff4065b78ff9d8
	Author: Nir Soffer <nsoffer@redhat.com>
	Date:   Mon Nov 4 22:08:45 2013 +0200

	    lvm: Do not use udev cache for obtaining device list

	We removed the setting in RHEL 7:

	commit e820cc58d0f89480ed4ce701c492501539a0b7ba
	Author: Fred Rolland <frolland@redhat.com>
	Date:   Sun Oct 18 14:51:31 2015 +0300

	    lvm: Use udev cache for obtaining device list

	So either this was not fixed in RHEL 7, or it broke again in RHEL 8.2.
	Having the option will make it easier to test if this is really fixed
	next time.

	Using udev should speed up LVM commands, but testing on CentOS 7.8 show
	that it is slightly faster without udev. We perform the same work but
	number of reloads is about 5% higher.

	$ time ../reload.py --lv-count 50
	...
	2020-05-31 17:47:10,809 INFO    (reload/lv) Stats: reloads=567 errors=0 error_rate=0.00% avg_time=0.607 med_time=0.511 min_time=0.133 max_time=1.696
	2020-05-31 17:47:10,810 INFO    (reload/pv) Stats: reloads=585 errors=0 error_rate=0.00% avg_time=0.588 med_time=0.488 min_time=0.135 max_time=1.708
	2020-05-31 17:47:10,913 INFO    (reload/vg) Stats: reloads=518 errors=0 error_rate=0.00% avg_time=0.664 med_time=0.534 min_time=0.153 max_time=1.718

	real 5m44.438s
	user 2m7.326s
	sys 2m7.907s

	$ time ../reload.py --lv-count 50 --no-udev

	2020-05-31 18:00:39,387 INFO    (reload/pv) Stats: reloads=616 errors=0 error_rate=0.00% avg_time=0.558 med_time=0.489 min_time=0.135 max_time=1.274
	2020-05-31 18:00:39,561 INFO    (reload/lv) Stats: reloads=603 errors=0 error_rate=0.00% avg_time=0.570 med_time=0.494 min_time=0.132 max_time=1.438
	2020-05-31 18:00:39,666 INFO    (reload/vg) Stats: reloads=544 errors=0 error_rate=0.00% avg_time=0.632 med_time=0.556 min_time=0.154 max_time=2.055

	real 5m44.036s
	user 2m0.997s
	sys 1m52.350s

	[1] https://bugzilla.redhat.com/1812801#c3

	Bug-Url: https://bugzilla.redhat.com/1842053
	Related-To: https://bugzilla.redhat.com/1837199

	tests: reload: Extract LVMRunner class
	Clean up the code by extracting LVMRunner class enforcing --config for
	all commands. This is pretty important since you must run this test as
	root, and the config protects from unwanted changes in unrelated vgs.

	Instead of passing lvm_config around, we pass LVMRunner instance used to
	perform all lvm commands.

	Common commands moved into LVMRunner. Special commands use the generic
	run() method adding the config.

	Related-To: https://bugzilla.redhat.com/1837199

	tests: reload: Fix reload counting
	Include all reloads in the "reloads" counter, so we get more useful
	error rate (errors / total reloads) instead of errors / successful
	reloads).

	Include the time spent in failed reloads in the timing to get more
	accurate timing info.

	Related-To: https://bugzilla.redhat.com/1837199

	tests: Add stress test for lvm reloads
	Add a stress test reproducing reload failures when running pvs, vgs, and
	lvs concurrently with lvm commands modifying storage.

	Here is an example error reproduced on CentOS 7.8 running
	lvm2-2.02.186-7.el7_8.1:

	    lvs --config '<vdsm-config>' --select <vg-name> && lv_name = <lv-name>'

	    Scan of VG bz1837199-000000000000000000000-0009 from
	    /dev/mapper/delay0000000000000000000000000009 found metadata seqno
	    2220 vs previous 2219.
	    Metadata on /dev/mapper/delay0000000000000000000000000000 at
	    128238592 has wrong VG name "<random junk>" expected
	    bz1837199-000000000000000000000-0000.
	    Not repairing metadata for VG bz1837199-000000000000000000000-0000.
	    Recovery of volume group "bz1837199-000000000000000000000-0000" failed.
	    Cannot process volume group bz1837199-000000000000000000000-0000'

	I could not reproduce any error on Fedora 31, using lvm2-2.03.09-1.

	This stress test can be used to find the most robust and efficient way
	to reload lvm metadata, and verify that new lvm version does not
	introduce regressions that may affect vdsm.

	The tests should be standalone script that can run on any host without
	installing vdsm, so it would be useful also to LVM developers. We don't
	import anything from vdsm.

	Related-To: https://bugzilla.redhat.com/1837199

2020-06-01  Milan Zamazal  <mzamazal@redhat.com>

	hostdev: Report available NVDIMM namespace devices
	This patch adds NVDIMM namespace devices to the list of host devices.
	Although NVDIMMs are actually treated as memory devices rather than
	host devices in libvirt, it's more practical to handle them similarly
	to host devices in Engine and the web UI.

2020-05-30  Pavel Bar  <pbar@redhat.com>

	constants, network: Use size constants for KiB, MiB
	- Use "KiB" and "MiB" constants instead of magic numbers.

2020-05-28  Ales Musil  <amusil@redhat.com>

	net: Add depracation warning for initscripts

2020-05-27  Marcin Sobczyk  <msobczyk@redhat.com>

	hooks: vhostmd: Handle virtio channels
	Vhostmd 1.1 introduces new way for the VMs to communicate with 'vhostmd'
	daemon running on the host - a virtio serial channel. This patch adds
	the support for using this transport (on top of the existing block
	device method, which is still relevant) to the 'vhostmd' hook
	along with some tests.

	Insertion of the block device and the virtio serial channel is decided
	dynamically and depends on the configuration of 'vhostmd'. Both can
	(and will, if the config says so) be enabled simultanously.

	[1] https://github.com/vhostmd/vhostmd/blob/52b2dbf5c7136f87b1e6a6f4a10c363779bd1fb4/README

2020-05-27  Bell Levin  <blevin@redhat.com>

	net, CI: Add containerized integration tests
	Introducing the possibility to run the integration tests
	in a container.

	This helps with getting the same result across different
	environments.

2020-05-26  Vojtech Juranek  <vjuranek@redhat.com>

	tests: mark glance download test as integration
	Download test runs against real production Glance server. Not to cause
	unexpected load on this server, mark it as integration, as it's skipped
	by default.

	tox: add integration pytest mark
	Add "integration" pytest mark. This will mark tests which tests vdsm
	integration with 3th party services. These tests can be run e.g.against
	public production services. To do so using automated tests is not polite
	as tests can generate random spikes or other issues on the service
	servers, thus skip them by default.

	In the future, we can use containers and run these services against our
	servers running in the containers (e.g. started and stopped using pytest
	fixture). As these services can consume higher amount of resources, we
	can reuse this mark for such tests.

2020-05-26  Amit Bawer  <abawer@redhat.com>

	lvm: Break too long suppression pattern lines
	Minor refactoring for avoiding extra long lines
	which required the usage of a linter hint
	 # NOQA: E501 (potentially long line)
	to go along with it.

	Bug-Url: https://bugzilla.redhat.com/1814022

	lvm: Suppress warnings about inconsistent VG metadata
	Since RHV deployments allow multiple hosts to access same VG
	metadata with no distributed LVM lock but sanlock, the following
	warnings appear in logs for successful LVM operations:

	    WARNING: ignoring metadata seqno 1566 on /dev/mapper/x for seqno 1567 on /dev/mapper/y for VG vg-name.
	    WARNING: Inconsistent metadata found for VG vg-name.

	These messages may be a warning when running LVM locally,
	but for RHV cluster setup they are expected.

	Bug-Url: https://bugzilla.redhat.com/1814022

	lvm: Unsuppress warning for unknown global/event_activation
	fc30 platforms using lvm2-2.02 did not support disabling
	automatic lv devices activation and as such would warn:

	"Configuration setting global/event_activation unknown"

	Now that fc30 reaches EOL and fc31 already uses lvm2-2.03,
	we unsuppress such warnings from the filter.

	https://bugzilla.redhat.com/1814022

	lvm: Unsuppress warning for mixing lv tags changes with activation
	The following LVM warning used to be suppressed:

	WARNING: Combining activation change with other commands is not advised.

	Now that BZ#1639360 is fixed we unsuppress this warning
	as it should not occur upon LVM commands execution.

	Bug-Url: https://bugzilla.redhat.com/1814022
	Related-To: https://bugzilla.redhat.com/1639360

2020-05-26  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.18

2020-05-26  Ales Musil  <amusil@redhat.com>

	net, tests: Remove custom compilation for initscripts
	The version 10.00.4 of initscripts is now included in CentOs 8.1.
	So any newer image for functional test should include it without
	the need to compile it from source.

2020-05-26  Bell Levin  <blevin@redhat.com>

	net tests: Remove bridge retain ip test
	This is already tested in the following test:
	test_move_nic_between_bridgeless_and_bridged_keep_ip

	net tests: Remove duplicate bridge reconf test
	This test check was introduced in [1], where changing the IP
	address to a bridge would not reconfigure the bridge.
	The same test is being run in different places in the func tests,
	for example in "test_restore_network_static_ip_from_config".

	[1] https://gerrit.ovirt.org/#/c/37059/

	net tests: Remove dhcp leases test
	dhcp leases was deprecated and is not relevant anymore.

	net tests: Remove blockingdhcp test
	This test is not relevant anymore since ifcfg is deprecated,
	and nmstate succeeds in adding a network even if no dhcp server
	in place.

	It is a slow test, which was not run up until now. Setting up
	a slow marking for pytest, and converting this test only for legacy
	switch is not worth the effort and running time.

	net tests: Remove old vlanned bonded rollback test
	This test checks to see the bond and vlanned net are rolled back
	to the original stage after the setupNetwork editing the net, fails,
	and no bond or vlan is added.

	This is already checked throughout the rollback_test.py in the net
	functional tests, namely in test_edit_network_fails,
	test_setup_two_networks_second_fails.

	Testing no bond / vlan was added in test_remove_broken_network as well.

	net test: Remove old setup over existing dhcp test
	The following test checks to see the setupNetwork on top of an
	interface with a running dhclient process, is sucesfully configured
	with vdsm.

	This scenario is already tested in TestStopDhclientOnUsedNics.

2020-05-25  Nir Soffer  <nsoffer@redhat.com>

	travis: Use images from quay
	I added our docker images to quay and found that it works better
	compared to docker. Manual builds are fast, and automatic build on every
	push to github work. Change travis to pull images from quay.

2020-05-25  Eyal Shenitzky  <eshenitz@redhat.com>

	backup.py: implement delete_checkpoints
	Implement delete_checkpoints verb.

	Deleting checkpoint is needed by the Engine in the following
	cases:
	  - For reducing the number of known checkpoints for a VM,
	    delete X oldest checkpoints to keep number of checkpoints small.

	  - When the Engine and libvirt cannot be synced, delete all checkpoints.

	The given checkpoints IDs is a list of checkpoint IDs reported by
	libvirt to the Engine as checkpoints that are defined on the VM, the
	list is arranged from the oldest checkpoint ID to the newest checkpoint
	ID.

	In case of failure to remove a checkpoint from the chain, the operation
	will stop and will not continue to remove the following checkpoints on
	the list. In that case, the details of the error will be attached to the
	response:

	{
	  'checkpoint_ids': [removed_checkpoint_1, removed_checkpoint_2, ... ],
	  'error': {
	    'code': err_code,
	    'message': "err_msg"
	  }
	}

2020-05-25  Vojtech Juranek  <vjuranek@redhat.com>

	images: use glance module directly for upload/download
	Currently only supported external service for uploading/downloading
	images is OpenStack Glance. Use glance module directly in image module
	and remove unused methods and related code from imageSharing module.

	images: use glance module
	Use newly created glance module for obtaining image size, upload and
	download images.

	Bug-Url: https://bugzilla.redhat.com/1704349

	storage: add glance module
	Add Glance module which provides function for getting image size using
	both Glance v1 API (deprecated) and v2 API (current) and upload and
	download of images.

	Add support for Glance v2 API. This API doesn't support obtaining image
	metadata using HTTP HEAD as in v1 and return HTTP 405 (method not
	allowed) in API v2. HTTP GET has to be used instead.

	When downloading an image, in v1 API we just call HTTP GET, while in
	case of v2 API, we need to add "/file" suffix to image URL.

	OpenStack v1 API was deprecated by v2 API long time ago and removed from
	OpenStack Rocky release [1], which was released on Aug. 30 2018 [2].
	As Glance v1 is currently still used in engine, keep support for v1 API
	until it's removed from engine.

	Add also tests for these function. As we have public oVirt Glance server,
	use this server for tests to have realistic response and don't have to
	mock it. If the server is not available, tests should be skipped.

	[1] https://docs.openstack.org/api-ref/image/versions/index.html#what-happened-to-the-v1-api
	[2] https://releases.openstack.org/rocky/index.html

	curl: don't use mutable default arguments

2020-05-25  Eli Mesika  <emesika@redhat.com>

	fix JSON-RPC error when fencing fails
	This patch fixes the fence method to get rid of execCmd to run the
	fencing script and use the start method instead since execCmd is
	deprecated and python3 execCmd will return the error and output as binary arrays.

	In addition the output/error may be binary and should be converted to
	utf-8 before the result is returned to the caller.

	Bug-Url: https://bugzilla.redhat.com/1810974

2020-05-25  Nir Soffer  <nsoffer@redhat.com>

	tox: Show top 20 slowest storage tests
	Storage tests take more than 500 seconds in Jenkins. Lets show more slow
	tests to make it easier to monitor and improve.

	= 1940 passed, 77 skipped, 100 deselected, 177 xfailed, 1 xpassed, 5 warnings in 539.50 seconds =

	========================== slowest 20 test durations ===========================
	 15.27s call     tests/storage/blocksd_test.py::test_dump_sd_metadata[5]
	 15.03s call     tests/storage/blocksd_test.py::test_dump_sd_metadata[4]
	 10.87s call     tests/storage/formatconverter_test.py::test_convert_to_v5_block[3]
	  9.79s call     tests/storage/formatconverter_test.py::test_convert_to_v5_block[4]
	  7.99s call     tests/storage/blocksd_test.py::test_create_snapshot_size[4]
	  7.97s call     tests/storage/blocksd_test.py::test_create_snapshot_size[5]
	  7.50s call     tests/storage/blocksd_test.py::test_volume_life_cycle[5]
	  7.46s call     tests/storage/blocksd_test.py::test_volume_life_cycle[4]
	  7.30s call     tests/storage/blocksd_test.py::test_spm_lifecycle
	  6.81s call     tests/storage/blocksd_test.py::test_volume_life_cycle[3]
	  5.68s call     tests/storage/mailbox_test.py::TestCommunicate::test_roundtrip[63-0.05]
	  5.60s call     tests/storage/blocksd_test.py::test_volume_metadata[4]
	  5.03s call     tests/storage/blocksd_test.py::test_create_domain_metadata[4]
	  5.01s call     tests/storage/hsm_connect_test.py::test_refresh_storage_once[3-expected_calls4]
	  5.01s call     tests/storage/hsm_connect_test.py::test_failed_connection[3]
	  5.01s call     tests/storage/hsm_connect_test.py::test_connect[3-connections4]
	  5.01s call     tests/storage/hsm_connect_test.py::test_connect[3-connections1]
	  4.93s call     tests/storage/blocksd_test.py::test_create_domain_metadata[5]
	  4.92s call     tests/storage/blocksd_test.py::test_volume_metadata[5]
	  4.79s call     tests/storage/mailbox_test.py::TestCommunicate::test_roundtrip[63-0]

	On Travis tests are about 3 times faster, and we run more tests:

	= 2112 passed, 77 skipped, 100 deselected, 6 xfailed, 5 warnings in 266.79 seconds =

	========================== slowest 20 test durations ===========================
	  5.47s call     tests/storage/blocksd_test.py::test_dump_sd_metadata[5]
	  5.30s call     tests/storage/blocksd_test.py::test_dump_sd_metadata[4]
	  5.01s call     tests/storage/hsm_connect_test.py::test_refresh_storage_once[3-expected_calls4]
	  5.01s call     tests/storage/hsm_connect_test.py::test_connect[3-connections1]
	  5.01s call     tests/storage/hsm_connect_test.py::test_failed_connection[3]
	  5.00s call     tests/storage/hsm_connect_test.py::test_connect[3-connections4]
	  4.24s call     tests/storage/mailbox_test.py::TestCommunicate::test_roundtrip[63-0.05]
	  3.90s call     tests/storage/formatconverter_test.py::test_convert_to_v5_block[3]
	  3.73s call     tests/storage/formatconverter_test.py::test_convert_to_v5_block[4]
	  3.29s call     tests/storage/blocksd_test.py::test_create_snapshot_size[4]
	  3.12s call     tests/storage/blocksd_test.py::test_spm_lifecycle
	  2.92s call     tests/storage/blocksd_test.py::test_create_snapshot_size[5]
	  2.90s call     tests/storage/blocksd_test.py::test_create_domain_metadata[5]
	  2.89s call     tests/storage/blocksd_test.py::test_create_domain_metadata[4]
	  2.84s call     tests/storage/blocksd_test.py::test_volume_life_cycle[5]
	  2.78s call     tests/storage/blocksd_test.py::test_create_domain_metadata[3]
	  2.73s call     tests/storage/blocksd_test.py::test_volume_life_cycle[4]
	  2.64s call     tests/storage/blocksd_test.py::test_volume_life_cycle[3]
	  2.63s call     tests/storage/mailbox_test.py::TestCommunicate::test_roundtrip[32-0.05]
	  2.34s call     tests/storage/lvm_test.py::test_vg_extend_reduce

2020-05-22  Vojtech Juranek  <vjuranek@redhat.com>

	curl: add HTTP GET function
	Add function for HTTP GET, which return whole output as it is.

	curl: move curl execution into separate function
	Refactor curl execution into separate function so it can be reused in
	follow-up patches which will add other functionality.

	Also fix default argument in head() function not to use mutable
	argument.

2020-05-21  Nir Soffer  <nsoffer@redhat.com>

	docker: Upgrade tox to 3.15
	We pin tox version to avoid uncontrolled upgraded breaking the build.
	I'm using tox 3.15 in locally and in imageio CI for a while without
	issues, lets upgrade.

2020-05-21  Amit Bawer  <abawer@redhat.com>

	xlease_test: Assert on set of leases keys
	Minor fix per CR comment [1].

	[1] https://gerrit.ovirt.org/#/c/109162/3/tests/storage/xlease_test.py@446

	Bug-Url: https://bugzilla.redhat.com/1557147

	API, fileSD, blockSD: Add full parameter to dumpStorageDomain()
	As dumping leases can be time and data bandwidth consuming for
	1000 volumes SD it is useful to have a way to toggle whether
	to dump only SD metadata and volumes info. We add full=False to
	only dump the volumes and SD info. If full=True, the API will
	also dump leases, lockspace and xleases information as well.

	Time measures for dump sections in block SD with 1000 volumes:

	   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
	        1    0.000    0.000   48.176   48.176 API.py:1063(StorageDomain.dump)
	        1    0.000    0.000   48.139   48.139 blockSD.py:1732(BlockStorageDomain.dump)
	        1    0.001    0.001   44.540   44.540 blockSD.py:1825(BlockStorageDomain._dump_leases)
	        1    0.011    0.011    2.334    2.334 blockSD.py:1749(BlockStorageDomain._dump_volumes)
	        1    0.000    0.000    0.132    0.132 sd.py:1588(BlockStorageDomain.dump_lockspace)
	        1    0.000    0.000    0.089    0.089 sd.py:1455(BlockStorageDomain.dump_external_leases)

	Time measures for dump sections in nfs SD with 1000 volumes:

	   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
	        1    0.000    0.000    8.720    8.720 API.py:1063(StorageDomain.dump)
	        1    0.000    0.000    8.691    8.691 fileSD.py:842(NfsStorageDomain.dump)
	        1    0.006    0.006    8.302    8.302 fileSD.py:858(NfsStorageDomain._dump_volumes)
	        1    0.000    0.000    0.237    0.237 fileSD.py:910(NfsStorageDomain._dump_leases)
	        1    0.000    0.000    0.096    0.096 sd.py:1455(NfsStorageDomain.dump_external_leases)
	        1    0.000    0.000    0.045    0.045 sd.py:1588(NfsStorageDomain.dump_lockspace)

	Bug-Url: https://bugzilla.redhat.com/1557147

	fileSD: Add 'xleases' section to dumpStorageDomain()
	API schema provides the xlease volume offset with the
	updating indication per lease-id (VM uuid):

	 "xleases": {
	   "00373f1a-1700-4c73-adc3-096377f7fb37": {
	        "offset": 97517568,
	        "updating": false
	   },
	  ...
	 }

	Bug-Url: https://bugzilla.redhat.com/1557147

	blockSD: Add 'xleases' section to dumpStorageDomain()
	API schema provides the xlease volume offset with the
	updating indication per lease-id (VM uuid):

	 "xleases": {
	        "48de6e2b-93eb-46d8-976c-c6c19573abff": {
	            "offset": 3145728,
	            "updating": false
	        },
	   ...
	 }

	Bug-Url: https://bugzilla.redhat.com/1557147

	xlease: Don't break on bad index record when dumping leases
	For bulk API we would like to dump all the valid xlease index
	records, so only log the bad record and continue to next one.

	Future enhancement for dumping also the bad records could be
	useful for analysis purpose.

	Bug-Url: https://bugzilla.redhat.com/1557147

	fileSD: Add 'lockspace' section to dumpStorageDomain()
	Dump lockspace information from "sanlock direct dump"
	output for ids volume:

	    "lockspace": [
	        {
	            "gen": 3,
	            "lockspace": "fef1dcb3-aad1-4918-8307-a53930818b5a",
	            "offset": 0,
	            "own": 1,
	            "resource": "70b590c7-9c8b-4e9b-932b-dce623965f5c.vm-18-54.e",
	            "timestamp": 1404
	        }
	    ]

	Bug-Url: https://bugzilla.redhat.com/1557147

	blockSD: Add 'lockspace' section to dumpStorageDomain()
	Dump the output of "sanlock direct dump" for the ids volume:

	   "lockspace": [
	        {
	            "gen": 3,
	            "lockspace": "f7bf4ac4-1fda-4788-af10-20380ff13a6c",
	            "offset": 0,
	            "own": 1,
	            "resource": "a7ab95f0-2bcb-4671-a3fb-8271ed982749.localhost.",
	            "timestamp": 915
	        }
	    ]

	Bug-Url: https://bugzilla.redhat.com/1557147

	fileSD: Add 'leases' section to dumpStorageDomain()
	Dump leases information from "sanlock direct dump" for
	leases volume:

	"leases": [
	        {
	            "gen": 3,
	            "lockspace": "fef1dcb3-aad1-4918-8307-a53930818b5a",
	            "lver": 5,
	            "offset": 1048576,
	            "own": 1,
	            "resource": "SDM",
	            "timestamp": 1353
	        }
	    ]

	Bug-Url: https://bugzilla.redhat.com/1557147

	blockSD: Add 'leases' section to dumpStorageDomain()
	Dump the leases information from "sanlock direct dump"
	output for the leases volume:

	    "leases": [
	        {
	            "gen": 1,
	            "lockspace": "f7bf4ac4-1fda-4788-af10-20380ff13a6c",
	            "lver": 1,
	            "offset": 1048576,
	            "own": 1,
	            "resource": "SDM",
	            "timestamp": 0
	        },
	        {
	            "gen": 0,
	            "lockspace": "f7bf4ac4-1fda-4788-af10-20380ff13a6c",
	            "lver": 0,
	            "offset": 105906176,
	            "own": 0,
	            "resource": "bc00ab08-30fe-418b-9103-8ed02a40c270",
	            "timestamp": 0
	        },
	        {
	            "gen": 0,
	            "lockspace": "f7bf4ac4-1fda-4788-af10-20380ff13a6c",
	            "lver": 0,
	            "offset": 106954752,
	            "own": 0,
	            "resource": "1d5f122f-4796-436c-aa38-c3f96635fc03",
	            "timestamp": 0
	        },
	      ...
	   ]

	Bug-Url: https://bugzilla.redhat.com/1557147

2020-05-21  Dominik Holler  <dholler@redhat.com>

	net, nmstate: Add notifications for ipv6 stateless
	If the IPv6 address is generated on the host,
	DHCP6_IP6_ADDRESS is not set in the dispatcher script.

	IP6_ADDRESS_0 is set on SLAAC, stateful and stateless DHCPv6,
	but not if a static IPv6 address is set.

	Bug-Url: https://bugzilla.redhat.com/1690485

2020-05-19  Nir Soffer  <nsoffer@redhat.com>

	travis: Expose /dev/log from host
	When running sanlock_direct_test.py on travis, sanlock syslog messages
	leak to stderr, breaking test output:

	    storage/sanlock_direct_test.py sanlock-direct[13001]: init lockspace
	    LS:1:/var/tmp/vdsm-storage/mount.file-512/file:0 0x110

	This happens because sanlock is using:

	    openlog("sanlock-direct", LOG_CONS | LOG_PID, LOG_DAEMON);

	And it logs warnings for every "sanlock direct" command:

	    syslog(LOG_WARNING, "init lockspace %.48s:%llu:%s:%llu 0x%x", ...

	There is no /dev/log in the container, so syslog writes the message to
	/dev/console, which seems to be stderr in the container.

	This was fixed in sanlock in:

	commit 315a0b90612a2b6c3b11d33b48385aa1352fa671
	Author: Nir Soffer <nsoffer@redhat.com>
	Date:   Tue May 19 09:24:43 2020 -0500

	    sanlock: remove LOG_CONS from direct command

	But it will take time until this fix is available in all supported
	distros.

	Mounting host's /dev/log in the container fixes this issue with current
	sanlock, logging the messages to host syslog.

	tests: Improve dump_holes test
	Change the volume layout to simulate the leases volume. The first slot
	is always empty (was used by safelease in the past). Then we have some
	reserved slots, and then volume leases.

	The test shows that when we specify size, block size and alignment,
	sanlock handles holes properly in the start, middle and end. This means
	that we can always dump the leases volume from offset 0, so we don't
	need to amend relative offsets in old sanlock versions.

	Tested with sanlock 3.8.0 and 3.8.1 on Fedora 31.

	tests: Simplify dump_leases test
	We don't need to initialize a lockspace to create resources; sanlock
	direct does not care if the lockspace exist.

	When creating resource, use zero base resource names to make the layout
	more clear; resource "RS<N>" is at offset N * align.

2020-05-19  Marcin Sobczyk  <msobczyk@redhat.com>

	tests: Disable randomly failing tests
	'testClientNotify' and 'testsMethodBadParameters' tests fail randomly
	in CI - let's disable them for now.

2020-05-19  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.17

2020-05-18  Ales Musil  <amusil@redhat.com>

	net, nmstate: Send all dhcp info in event
	The dispatcher dhcp_monitor was sending only
	information that were available however vdsmd was
	expecting everything. This was not causing any issues
	with ipv4 because we had everything we needed.

	Ipv6 on the other hand did miss some information.
	Sending none instead of missing field will prevent
	exceptions thrown because of that.

	This mechanism was heavily relied on by functional
	tests. This fix speeds their execution by up to 15%
	depending on machine.

2020-05-18  Amit Bawer  <abawer@redhat.com>

	sd: Move RESERVED_LEASES to sd constants
	This will serve dumping leases for both fileSD and blockSD next.

	Bug-Url: https://bugzilla.redhat.com/1557147

	fakesanlock: Add dump_leases and dump_lockspace methods
	This will be of use for dumping leases and lockspaces
	information in next SD tests.

	Lease dump entry:

	    {
	        'offset': 0,
	        'lockspace': 'LS1',
	        'resource': 'RS1',
	        'timestamp': 0,
	        'own': 0,
	        'gen': 0,
	        'lver': 0
	    }

	Lockspace dump entry:

	    {
	        'offset': 0,
	        'lockspace': 'LS1',
	        'resource': 'RS1',
	        'timestamp': 0,
	        'own': 0,
	        'gen': 0
	    }

	Bug-Url: https://bugzilla.redhat.com/1557147

	sanlock_direct: Add module for sanlock direct command
	Module allows to execute "sanlock direct dump path:offset:size"
	by passing the corresponding parameters to query sanlock leases
	and lockspaces volumes and dump their records accordingly.

	Leases dump format:

	[
	    {
	        'offset': 1048576,
	        'lockspace': 'LS1',
	        'resource': 'RS1',
	        'timestamp': 0,
	        'own': 0,
	        'gen': 0,
	        'lver': 0
	    },
	    ...
	]

	Lockspace dump format:

	[
	    {
	        'offset': 0,
	        'lockspace': 'LS1',
	        'resource': 'HOST_NAME',
	        'timestamp': 447904,
	        'own': 1,
	        'gen': 2
	    },
	    ...
	]

	Bug-Url: https://bugzilla.redhat.com/1557147

	fileSD: Include removed image volumes in dumpStorageDomain()
	This will be of use for support analysis on removed volumes,
	providing their available metadata.

	Example:

	'volumes': {
	    'd84fb2ac-cb5d-42a5-b1ca-ce9461402ea6': {
	        'capacity': 2147483648,
	        'ctime': 1550522547,
	        'description': 'test',
	        'disktype': 'DATA',
	        'format': 'RAW',
	        'generation': 0,
	        'image': '0c853966-c62c-4735-a55b-4fd2a101bbb6',
	        'legality': 'LEGAL',
	        'parent': '00000000-0000-0000-0000-000000000000',
	        'type': 'SPARSE',
	        'voltype': 'LEAF',
	        'status': 'REMOVED',
	        'truesize': 4096,
	        'apparentsize': 2147483648
	    }
	    ...
	}

	Bug-Url: https://bugzilla.redhat.com/1557147

	blockSD: Include removed image volumes in dumpStorageDomain()
	This will be of use for support analysis on removed volumes,
	providing their available metadata.

	Example:

	'volumes': {
	    '7a1ecd82-ed54-4e8b-913f-5acfdeefd3b8': {
	        'capacity': 10737418240,
	        'ctime': 1582196150,
	        'description': 'test',
	        'disktype': 'DATA',
	        'format': 'COW',
	        'generation': 0,
	        'image': '1a6a693a-0716-4416-9927-ea9a10756f76',
	        'legality': 'LEGAL',
	        'parent': '00000000-0000-0000-0000-000000000000',
	        'type': 'SPARSE',
	        'voltype': 'LEAF',
	        'status': 'REMOVED',
	        'mdslot': 4,
	        'truesize': 1073741824,
	        'apparentsize': 1073741824
	    },
	    ...
	}

	Bug-Url: https://bugzilla.redhat.com/1557147

	blockSD: Dump volume status as invalid if failed to get its size
	Keep behavior consistent with fileSD volumes dump.

	Bug-Url: https://bugzilla.redhat.com/1557147

2020-05-18  Andrej Cernek  <acernek@redhat.com>

	net, tests: enable unicode bridge name test on py3
	The test did not work on py3, since during command execution all string
	arguments are converted to locale encoding, which was previously ASCII.
	Since the encoding has already been set to UTF-8 the test passes.

2020-05-18  Liran Rotenberg  <lrotenbe@redhat.com>

	virt: snapshot failure handling
	Using the jobs framework, it accepts only unhandled exceptions to
	determaine the result of the job. When we returned error response code
	it marked the job as 'done'. Therefore, the engine would get the job as
	succeed.

	This patch changes the exception handling in snapshot job from returning
	error response code to raising unhandled exception that will be caught
	in jobs framework and mark the job as a failure.

	Bug-Url: https://bugzilla.redhat.com/1835096

2020-05-17  Amit Bawer  <abawer@redhat.com>

	blocksd_test: Minor variables names refactor for change_vol_tag()
	Refactor variables following CR comment [1]

	[1] https://gerrit.ovirt.org/#/c/108000/26/tests/storage/blocksd_test.py@119

	localfssd_test: Add negative tests for dump volumes
	- Volume with corrupted meta file.
	- Volume with inaccessible meta file for parsing.
	- Volume with a bad size query.
	- Volume with deleted image.

	API, fileSD: Add 'volumes' section for dumpStorageDomain()
	Schema for volume metadata output is based on VolumeInfoResponse
	with the following differences:

	- "domain" is not part of the volume output.

	- "mdslot" integer is added for the metadata slot number of the volume.
	           This field only applies to volumes on block storage domain.

	- "mtime" is not part of the output. It was always reported as zero anyway.

	- "children" are not part of the output. The caller can find the volume
	             children by processing the returned data.

	- "apparentsize" and "truesize" are returned as integer instead of strings.

	- "lease" is not included, we will add info about leases later.

	Example output:

	  "volumes": {
	        "0002f19c-d05c-4a23-969e-334343019d3f": {
	            "apparentsize": 196624,
	            "capacity": 1073741824,
	            "ctime": 1581627145,
	            "description": "{\"DiskAlias\":\"file_disk\",\"DiskDescription\":\"file_disk\"}",
	            "disktype": "DATA",
	            "format": "COW",
	            "generation": 0,
	            "image": "7cc8ed67-84c9-4d66-a960-f2122ec786c1",
	            "legality": "LEGAL",
	            "parent": "00000000-0000-0000-0000-000000000000",
	            "status": "OK",
	            "truesize": 200704,
	            "type": "SPARSE",
	            "voltype": "LEAF"
	        },
	   ...
	   }

	Bug-Url: https://bugzilla.redhat.com/1557147

2020-05-17  Eyal Shenitzky  <eshenitz@redhat.com>

	backup: implement redefine checkpoints
	Implement redefine_checkpoints verb.

	API: change redefine checkpoint and delete checkpoint API
	According to the changes in libvirt, in order to redefine a checkpoint
	libvirt needs the checkpoint XML. The Engine stores this checkpoint XML
	after checkpoint creation in its database.

	The API to redefine the checkpoints of a VM changed to redefine a list
	of checkpoint each time instead of redefining all the checkpoints in one
	API call, this is needed because the checkpoint XML contains also the VM
	domain XML and uses a big amount of data. Separating the redefinition
	of the checkpoints will allow the Engine to know exactly which
	checkpoint redefinition failed and maybe recover from it.

	So now, the Engine request should be:

	vdsm-client -f redefine.json VM redefine_checkpoints

	{
	  'vmID': vm_id,
	  'checkpoints': [
	    {
	     'id': 'checkpoint_1',
	     'xml': 'checkpoint_1_xml'
	    },
	    {
	     'id': 'checkpoint_2',
	     'xml': 'checkpoint_2_xml'
	    },
	    ...
	  ]
	}

	The response will be a list of checkpoint IDs that were redefined
	successfully.
	If checkpoint redefinition failed, the list will not include the failed
	checkpoint ID and the ones that follow it and the response will contain
	the error description (code and message).

	For deleting checkpoints, the behavior will be similar but the list will
	include only the checkpoints IDs:

	vdsm-client -f delete.json VM delete_checkpoints

	{
	  'vmID': vm_id,
	  'checkpoint_ids': [checkpoint_1, checkpoint_2, ... ]
	}

	The response will be a list of checkpoint IDs that were deleted
	successfully.
	If checkpoint deletion failed, the list will not include the failed
	checkpoint ID and the ones that follow it and the response will contain
	the error description (code and message).

	The drawback of this change is that the operations will be longer because
	now the Engine will redefine or delete the checkpoints in bulks.

	backup.py: implement list_checkpoints()

2020-05-15  Eyal Shenitzky  <eshenitz@redhat.com>

	API: add new list_checkpoints() API call
	New VM.list_checkpoints() API call will be used to fetch the IDs of the
	checkpoints that are defined on the given VM ID.

	The list of checkpoints IDs that are defined on the VM will allow to the
	engine to decide if it should redefine missing checkpoints before
	starting an incremental backup.

	For example:
	vdsm-client VM list_checkpoints vmID=xxx-yyy

	{
	  'vmID': vm_id
	}

	The response will be a list of checkpoint IDs that are defined on the VM
	ordered from the oldest checkpoint to the leaf.

	[
	  "checkpoint-id-1",
	  "checkpoint-id-2"
	]

2020-05-15  Amit Bawer  <abawer@redhat.com>

	docker: Add sanlock to dependencies
	This will be of use for sanlock direct commands used
	in Travis tests.

2020-05-14  Amit Bawer  <abawer@redhat.com>

	curl-img-wrap: Use non-cached flushed writes for download command
	Reduce the kernel write buffer and memory cache contention of
	ongoing glance image uploads with sanlock lockspace writes,
	inducing IO timeouts (-202) for the latter.

	Gathering test runs for 10 trials of 1.6 GB image imports per
	each dd setup of the curl-img downalod job shows that we can stick
	to bs=2M as using larger bs with the oflag=nocahce,dsync option
	doesn't show significant improvement for transfer rates.

	Adding oflag=nocache,dsync introduces performance regression compared
	to old setup (x2 slower transfer rates for nfs storage domain).

	Using oflag=direct would preserve original performance but as QCOW2
	images being uploaded are not storage aligned, this would also require
	a considerable work.

	SD      bs (M)      oflag           Min (MB/s)  Max (MB/s)  Average (MB/s)
	--------------------------------------------------------------------------
	block   2           -               9.2         13.0        11.9
	nfs     2           -               10.8        12.9        11.8
	block   4           -               11.7        13.3        12.4
	nfs     4           -               11.0        13.7        12.3
	block   8           -               12.3        13.3        12.5
	nfs     8           -               12.2        13.3        12.8
	block   16          -               11.9        12.9        12.6
	nfs     16          -               11.8        13.2        12.3
	block   32          -               11.5        13.0        12.0
	nfs     32          -               9.6         12.6        11.1
	block   2           nocahce,dsync   9.4         10.3        9.8
	nfs     2           nocache,dsync   6.3         7.8         6.6
	block   4           nocache,dsync   8.7         10.9        9.4
	nfs     4           nocache,dsync   6.1         7.2         6.4
	block   8           nocache,dsync   9.2         11.7        10.1
	nfs     8           nocache,dsync   6.3         7.1         6.7
	block   16          nocache,dsync   9.0         11.1        10.9
	nfs     16          nocache,dsync   6.5         7.1         6.9
	block   32          nocache,dsync   9.0         11.3        10.4
	nfs     32          nocache,dsync   6.0         8.1         7.6
	block   2           direct          12.1        13.5        12.8
	nfs     2           direct          7.5         13.6        12.0
	block   4           direct          11.3        12.8        12.3
	nfs     4           direct          12.4        13.2        12.9
	block   8           direct          12.5        13.2        12.8
	nfs     8           direct          11.3        13.3        12.0
	block   16          direct          14.2        17.1        16.2
	nfs     16          direct          15.7        18.0        16.3
	block   32          direct          12.4        14.4        13.3
	nfs     32          direct          8.1         11.0        8.7

	Bug-Url: https://bugzilla.redhat.com/1832967

2020-05-13  Nir Soffer  <nsoffer@redhat.com>

	spec: Require ovirt-imageio-* >= 2.0.6
	When switching from explicit 2.0.5 requirement, we forgot to switch back
	to >=.

2020-05-13  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.16

2020-05-13  Nir Soffer  <nsoffer@redhat.com>

	spec: Require ovirt-imageio >= 2.0.6
	This version introduces incompatible changes:
	- Internal modules moved to ovirt_imageio._internal
	- Configuration is done by drop-in configuration files in
	  /etc/ovirt-imageio/conf.d

	Update kvm2ovirt, using internal module to import from the _internal
	package.

	Update vdsm configuration to install a drop-in file in
	/etc/ovirt-imagieo/conf.d/50-vdsm.conf, and explain how a user can
	override settings in vdsm configuration.

	Since we don't support now user modifications in vdsm ovirt-imageio
	configuration, we don't need to mark it as config(noreplace).

2020-05-12  Milan Zamazal  <mzamazal@redhat.com>

	numa: Always use int values for online cpu ids
	online_cpus in CpuTopology can be set to a list of either string or
	integer values, depending on whether libvirt reports NUMA cells or
	not.  With NUMA cells reported, a list of str values is retrieved from
	libvirt XML, otherwise a list of int values is obtained from
	taskset.online_cpus().  In the latter case, a type error is raised
	when the list of integers is joined as strings while onlineCpus value
	is created in caps.get().

	Let's unify the types in CpuTopology to integers, to be consistent
	with CPU ids elsewhere; and let's fix caps.get(), the only place that
	uses CpuTopology.online_cpus, to convert them to strings.

	Bug-Url: https://bugzilla.redhat.com/1834873

2020-05-12  Ales Musil  <amusil@redhat.com>

	vdsm: Use UTF8 as default locale
	To be able to use unicode in commands we need to
	switch every locale in vdsm to use UTF8 instead
	of ASCII.

	net, api: Set netConfigDirty correctly
	netConfigDirty is flag that serves as hint to engine
	whether the current network configuration is persisted.
	If the flag is set, the network configuration is not
	saved and might be lost during reboot.

	When commitOnSuccess is used the netConfigDirty was
	set as true despite the fact that the configuration
	is automatically persisted. Ensure that the flag is
	set accordingly.

	Bug-Url: https://bugzilla.redhat.com/1798818
	Bug-Url: https://bugzilla.redhat.com/1834248

2020-05-12  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.15

2020-05-12  Marcin Sobczyk  <msobczyk@redhat.com>

	ci: Make virtualenv requirement more accurate
	New 'virtualenv' versions break our CI by pulling in newer
	'six' version. We fixed it by requiring the last version
	that worked for us, but the requirement proposed by this patch
	is more accurate.

2020-05-12  Uri Lublin  <uril@redhat.com>

	typo fix: s/defualt/default/

2020-05-11  Amit Bawer  <abawer@redhat.com>

	qemuimg: Amend logger name to storage.qemuimg

2020-05-11  Nir Soffer  <nsoffer@redhat.com>

	curl-img-wrap: Sync after upload is complete
	Add conv=fsync to ensure that data reach physical storage when upload is
	done. Without it power loss on the host or on the storage server can
	result in corrupted image or wrong image metadata.

	curl-img-wrap: Reformat commands
	Use one argument per line to make the structure of the command more
	clear and make future changes cleaner.

	imageio: Require ovirt-imageio 2.0.5
	ovirt-imageio 2.0.6 will introduce incompatible changes. Require the
	latest version compatible with current vdsm, so when ovirt-imageio 2.0.6
	is released, vdsm will not break in developers setups.

	When ovirt-imageio 2.0.6 will be available we will require it and update
	vdsm to work with it.

	blockSD: Fix discard volume
	Discard volumes when wipe-after-delete is not set always fails with:

	2020-05-08 22:39:22,076+0300 WARN  (tmap-18/0) [storage.blockdev]
	Discarding device /dev/vg-name/lv-name failed: Command
	['/sbin/blkdiscard', '--step', '33554432', '/dev/vg-name/lv-name']
	failed with rc=1 out=b'' err=b'blkdiscard:  cannot open
	/dev/vg-name/lv-name: No such file or directory\n' (blockdev:106)

	Turns out that when we separated lvm activation from setting tags, we
	broke purgeImage assuming that a volume was activated when setting the
	__remove_me tag. The issue was hidden since we treat discard as best
	effort, and we don't fail on errors.

	Fixes commit 9d527d7c76ec (storage: activate/deactivate LVs before/after
	zeroing).

	Bug-Url: https://bugzilla.redhat.com/1833791

2020-05-11  Dominik Holler  <dholler@redhat.com>

	network: bond_monitor: Use concurrent.thread()
	Replace threading.Thread() with vdsm.common.concurrent.thread().
	The makes bond_monitor using the same pattern like
	dhcp_monitor.
	The message of commit afb4a42738f536b07d4dd8558f454c613daa4e22
	explains more details.

2020-05-07  Milan Zamazal  <mzamazal@redhat.com>

	spec: Require swtpm
	This is needed for emulated TPM device support.  The TPM device
	support itself is implemented in Engine.

2020-05-06  Amit Bawer  <abawer@redhat.com>

	blocksd_test: Add negative cases for test_dump_sd_metadata
	- Uninitialized volume.
	- Volume marked as deleted.
	- Volume marked as zeroed.
	- Volume with bad metadata tag.
	- Volume with bad metadata.

	API, blockSD: Add 'volumes' section to dumpStorageDomain()
	Schema for volume metadata output is based on VolumeInfoResponse
	with the following differences:

	- "domain" is not part of the volume output.

	- "mdslot" integer is added for the metadata slot number of the volume.
	           This field only applies to volumes on block storage domain.

	- "mtime" is not part of the output. It was always reported as zero
	          anyway.

	- "children" are not part of the output. The caller can find the volume
	             children by processing the returned data.

	- "apparentsize" and "truesize" are returned as integer instead of strings.

	- "lease" is not included, we will add info about leases later.

	Example output:

	    "volumes": {
	        "0049b422-7cb8-47b9-8af6-d9cea7614656": {
	            "apparentsize": 1073741824,
	            "capacity": 1073741824,
	            "ctime": 1579535791,
	            "description": "{\"DiskAlias\":\"mydisk\",\"DiskDescription\":\"Mydisk\"}",
	            "disktype": "DATA",
	            "format": "COW",
	            "generation": 0,
	            "image": "c0ff0170-3a63-4ff9-abf2-8d463b2e4af9",
	            "legality": "LEGAL",
	            "mdslot": 785,
	            "parent": "00000000-0000-0000-0000-000000000000",
	            "status": "OK",
	            "truesize": 1073741824,
	            "type": "SPARSE",
	            "voltype": "LEAF"
	        },
	   ...
	   }

	Bug-Url: https://bugzilla.redhat.com/1557147

2020-05-06  Ales Musil  <amusil@redhat.com>

	net, nmstate: Suppress traceback from dhcp monitor
	DHCP monitor would produce traceback in journal
	if it tries to contact vdsmd that is not running.

	Suppress this traceback and print friendly massage
	instead.

2020-05-06  Amit Bawer  <abawer@redhat.com>

	tests: Increase py-watch wait period to 1 second for test_kill_grandkids
	On congested CI runs the tested sub-process will not always make it to
	the stdout phase before py-watch timeout expires, omitting the grandchild
	pid from output and wrongfully parsing the py-watch delimiter line instead:

	>       grandkid_pid = int(e.value.out.splitlines()[0])
	E       ValueError: invalid literal for int() with base 10: b'============================================================='

	pywatch_test.py:124: ValueError

2020-05-06  Nir Soffer  <nsoffer@redhat.com>

	network: dhcp_monitor: Use concurrent.thread()
	Replace threading.Thread() with vdsm.common.concurrent.thread().

	vdsm.common.concurrent.thread() has many advantages:

	- If thread function raises it logs a traceback to vdsm log instead of
	  failing silently.
	- Creating daemon thread by default that will never block shutdown.
	- Set the underlying pthread name, making it harder to debug.
	- Log debug message when thread starts and finish.

	network: dhcp_monitor: Fix unix socket handling
	The monitor assumed that socket is always removed during shutdown, which
	is not valid assumption. For example vdsm may be killed before shutting
	down the monitor, or after shutting down the monitor but before calling
	os.remove().

	This results in this unrecoverable error when starting vdsm:

	2020-04-30 13:01:05,097-0400 ERROR (MainThread) [vds] Exception raised (vdsmd:164)
	Traceback (most recent call last):
	  File "/usr/lib/python3.6/site-packages/vdsm/vdsmd.py", line 162, in run
	    serve_clients(log)
	  File "/usr/lib/python3.6/site-packages/vdsm/vdsmd.py", line 115, in serve_clients
	    init_unprivileged_network_components(cif, supervdsm.getProxy())
	  File "/usr/lib/python3.6/site-packages/vdsm/network/initializer.py", line 46, in init_unprivileged_network_components
	    dhcp_monitor.initialize_monitor(cif, net_api)
	  File "/usr/lib/python3.6/site-packages/vdsm/network/dhcp_monitor.py", line 151, in initialize_monitor
	    raise e
	  File "/usr/lib/python3.6/site-packages/vdsm/network/dhcp_monitor.py", line 145, in initialize_monitor
	    monitor = Monitor.instance()
	  File "/usr/lib/python3.6/site-packages/vdsm/network/dhcp_monitor.py", line 94, in instance
	    _monitor_instance = Monitor(**kwargs)
	  File "/usr/lib/python3.6/site-packages/vdsm/network/dhcp_monitor.py", line 78, in __init__
	    socket_path, _MonitorHandler
	  File "/usr/lib64/python3.6/socketserver.py", line 456, in __init__
	    self.server_bind()
	  File "/usr/lib64/python3.6/socketserver.py", line 470, in server_bind
	    self.socket.bind(self.server_address)
	OSError: [Errno 98] Address already in use

	Vdsm ends in endless reboot loop. The only way to recover is to remove
	the leftover socket.

	The only way to ensure that you can bind to a unix socket is to remove
	the socket before trying to bind. Add _remove_socket() helper and call
	it before creating the server and after stopping the monitor, but before
	shutting it down to make it less likely to leave leftover socket during
	shutdown.

	When removing the socket, do not warn about missing socket which is
	expected condition when starting the server. When logging a warning
	after socket removal failed, include the underlying error message
	instead of hiding it.

	sitecustomize: Remove sys.setdefaultencoding() hack
	In python 2 we used a hack to set the default encoding to "utf-8"
	instead of "ascii":

	    sys.setdefaultencoding('utf8')

	This hack is not needed in python 3 since the default encoding is
	already "utf-8", and we don't support python 2 for a while.

	common: time: Use builtin time.monotonic()
	Python 3 introduced a high resolution (nanoseconds) monotonic time
	source, so we don't need to use the os.times() hack that was limited to
	10 milliseconds resolution.

	Implementing event_time() with time.monotonic() breaks old engines,
	since they had wrong parsing code, assuming that event time is always a
	Java long value (> INT_MAX)[1]. This was hidden by the fact that
	os.times()[4] starts at 2**32 / 1000 on boot. To keep compatibility with
	older engine, we ensure that event time is always larger than INT_MAX.

	This change also fixes possible issue with events and status report
	submitted in the same 10 millisecond time frame, and could be processed
	in the wrong order in engine side. Now all events have millisecond
	resolution.

	[1] https://gerrit.ovirt.org/c/108651

	Related-To: https://bugzilla.redhat.com/1828088

	common: time: Introduce event_time()
	Notifications and VM status use monotonic_time() in milliseconds. The
	logic was duplicated in yajsonrpc.Notification._add_noitfy_time() and
	vdsm.virt.Vm._get_status_time(), but we did not have any documentation
	or other clue that these functions are related.

	Both functions must use the same time source, otherwise engine may
	ignore status reports or events, thinking that they are too old. This
	leads to breaking VM life cycle management[1].

	Introduce new time.event_time() removing the duplication, and use the
	new function in places that need to use event time. This makes it harder
	to break engine by updating only one of the functions.

	[1] https://bugzilla.redhat.com/1828088

	Related-To: https://bugzilla.redhat.com/1828088

2020-05-05  Beni Pelled  <bpelled@redhat.com>

	virt: fix module import in v2v.py
	Recently the ovirt_imageio_common renamed to ovirt_imageio
	and the v2v.py should be updated accordingly.

	Bug-Url: https://bugzilla.redhat.com/1830944

2020-05-05  Dominik Holler  <dholler@redhat.com>

	net: Notify Engine if the active slave changes
	A new bond monitor is added, which sends a notification the
	netlink event IFLA_EVENT_BONDING_FAILOVER, which indicates
	that the active slave of a bond mode 1 changed.

	Bug-Url: https://bugzilla.redhat.com/1671876
	Bug-Url: https://bugzilla.redhat.com/1801794

	net: Add netlink monitor active slaves of a bond
	The already existing netlink monitor generates an event which
	contains data from the libnl cached object model, triggered
	by the receiving of a net netlink message.

	The change of the active slave of a mode 1 bond is indicated
	by the IFLA_EVENT_BONDING_FAILOVER, which is not included in
	libnl's object model.

	For this reason, an additional monitor is added, which checks
	the netlink message for IFLA_EVENTs.

	Bug-Url: https://bugzilla.redhat.com/1671876
	Bug-Url: https://bugzilla.redhat.com/1801794

	net, tests: unify bond_device contextmanager
	Integrate the bond_device contextmanager from
	integration/link_bond_test into the one from
	nettestlib.

	This prepares the usage of the functionality from
	the former bond_device from link_bond_test in additional tests.

2020-05-05  Artur Socha  <asocha@redhat.com>

	betterAsyncore: proper handling of SSL_ERROR_WANT_READ in recv
	This patch handles SSL_ERROR_WANT_READ signal and additionally
	fixes exception handlers order in 'recv'.
	'OSError' is superclass for 'sslutils.SSLError' and 'socket.error'
	is an alias for 'OSError' so as a result SSLError used to be swallowed.

	$ python3
	>>> import socket
	>>> assert socket.error is OSError

2020-05-05  Gobinda Das  <godas@redhat.com>

	Moving glusterfs pkgs from 3 to 6

2020-05-04  Amit Bawer  <abawer@redhat.com>

	automation: Update glusterfs el8 repository URL for gluster-7
	The old repo URL has became unavailable for CI runs, looking its parent
	path [1] leads to a list of alternatives.

	Update the URL according to el8 recommendation for gluster-7.

	[1] https://download.gluster.org/pub/gluster/glusterfs/6/LATEST/RHEL/

2020-04-30  Nir Soffer  <nsoffer@redhat.com>

	tests: Move tests/common/time_test.py to pytest
	Replace VdsmTestCase and MonkeyPatch with pytest facilities. This module
	is run now by tox instead of tests/Makefile.

2020-04-29  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.14

2020-04-29  Eitan Raviv  <eraviv@redhat.com>

	network: report nmstate version
	Report nmstate version installed on the host alongside the other package
	versions being reported.

	Bug-Url: https://bugzilla.redhat.com/1803484

2020-04-28  Andrej Cernek  <acernek@redhat.com>

	net, tests: fix mirroring tests on py3
	While in py2 str and byte are the same and can be compared for equality,
	in py3 such comparison will fail. Therefore chr concatenation, which
	produces strings must be replaced with explicit definition (i.e. b'').

2020-04-24  Nir Soffer  <nsoffer@redhat.com>

	qemu: Support local qemu build
	When installing qemu from source it is installed to /usr/local. Change
	the search path so we use /usr/local/bin/qemu-* instead of
	/usr/bin/qemu-* if a local version installed. This matches libvirt
	behavior using /usr/local/bin/qemu-kvm if installed.

	With this we can test upstream patches without modifying vdsm code.

2020-04-24  Milan Zamazal  <mzamazal@redhat.com>

	virt: Fix typo in the copyright header of xmlconstants.py

2020-04-23  Nir Soffer  <nsoffer@redhat.com>

	storageServer: Shorten NFS timeouts
	Change the default timeout to 10 seconds, and number of retransmissions
	to 3. This should result in 60 seconds timeout before NFS request will
	fail, similar to the behaviour in multipath.

	According to nfs(5), NFS will retry a request after timeo deciseconds
	(timeo / 10 seonds). After each retransmission, the timeout is increased
	by timeo value (up to maximum of 600 seconds). After retrans retires,
	the NFS client will fail with "server not responding" message.

	This is the expected failure flow:

	00:00   retry 1 (10 seconds timeout)
	00:10   retry 2 (20 seconds timeout)
	00:30   retry 3 (30 seconds timeout)
	01:00   request fail

	In the past we were using timeout=600, retrans=6, which resulted in 21
	minutes timeout:

	00:00   retry 1 (60 seconds timeout)
	01:00   retry 2 (120 seconds timeout)
	03:00   retry 3 (180 seconds timeout)
	06:00   retry 4 (240 seconds timeout)
	10:00   retry 5 (300 seconds timeout)
	15:00   retry 6 (360 seconds timeout)
	21:00   request fail

	Testing show that storage monitor was blocked on storage for 270 seconds
	instead of the expected 60 seconds. So this is an improvement compared
	with 20-30 minutes seen with previous settings.

	VM running on the blocked NFS storage failed to resume after unblocking
	storage. This typically works with block storage.

	So this looks like an improvement, but more work may be needed.

	Bug-Url: https://bugzilla.redhat.com/1569926

2020-04-23  Bell Levin  <blevin@redhat.com>

	tests: Drop py2 unicode tests
	Since py2 was dropped, those tests are always skipped.

	net, CI: Add containerized unit tests
	The possibility to run the unit tests on a container.

	This introduces the possibilities to be able to run the tests on
	a fixed environment, from sources.

	net: Add mocking of bond defaults for unit tests
	Mocking the bonding files creates an independent
	environment for unit testing, not to depend on
	'/var/run/vdsm'.

	By adding the unit tests to use this fixture, it is clear
	that all tests are using the bonding defaults generation.
	The singular fixtures were removed from the conftest.py
	of the specific test folder, and autouse was enable in the
	network tests root conftest.py .

2020-04-22  Amit Bawer  <abawer@redhat.com>

	blockSD: Don't break for bad md tag on occupied_metadata_slots()
	This will allow to gather occupied metadata slots for
	dump volumes API in robust mode, not breaking from lvs
	scan due to bad metadata volume tag but logging a warning
	instead.

	Bug-Url: https://bugzilla.redhat.com/1557147

	blockSD: Make occupied_metadata_slots a module function
	This will make testing easier for next change on this logic.

	Bug-Url: https://bugzilla.redhat.com/1557147

2020-04-21  Benny Zlotnik  <bzlotnik@redhat.com>

	sd: log backup SD activation

2020-04-21  Ales Musil  <amusil@redhat.com>

	net, tests, nmstate: Enable test_create_network_and_reuse_existing_owned_bridge
	This tests still fails on legacy.

	net, tests: Enable stable link monitor with nmstate
	On legacy the link was stable and it should not be actually skipped.

	Links created by nmstate are stable since 0.2.9 and we can
	enable the monitor for them as well.

	The ovs is still skipped because there are some tests that fail
	because of that.

2020-04-20  Milan Zamazal  <mzamazal@redhat.com>

	gluster: Keep seclabel in network GlusterFS drives
	Engine adds seclabel's to those drives.  However on disk hot unplug,
	the drive XML is parsed and then built again, while not inserting the
	seclabel again for any network drive according to the logic in
	storage._getSourceXML.

	Let's fix that by adding a check for network GlusterFS drives to
	storage._getSourceXML stating that we need seclabel's for them.

	Bug-Url: https://bugzilla.redhat.com/1824805

2020-04-20  Lucia Jelinkova  <ljelinko@redhat.com>

	virt: report numa stats with hugepages
	This patch adds hugepages statistics to the numa node statistics.

	Bug-Url: https://bugzilla.redhat.com/1812316

2020-04-19  Eyal Shenitzky  <eshenitz@redhat.com>

	backup: allow performing incremental backup operation
	Remove prevention from performing an incremental backup operation and
	add 'incremental' attribute to domainbackup XML to mark the backup as
	incremental backup.

	Incremental backup is not fully supported at this stage, the operation
	is valid as long as the VM still running. When the VM will go down or
	crash all the backup (and checkpoints) chain cannot be used anymore
	and a full backup needs to be taken to start a new chain.

	Incremental backup will be fully supported when the API between the
	Engine and VDSM for redefining the checkpoints when the VM starts will
	be implemented.

2020-04-17  Nir Soffer  <nsoffer@redhat.com>

	virt: backup: Grammar fixes in comments

	virt: migration: Fix evil copy and paste
	While touching the comment, fix also other grammar issues copied from
	the original code.

2020-04-17  Amit Bawer  <abawer@redhat.com>

	health: Move the lvm stats logging into _check_lvm_stats
	Logging lvm stats from blockSD.selftest() is done per domain.
	For multiple domains, same stats log would be duplicated.
	Moving the lvm stats logging into health monitor checks
	will log them once per 5 minutes by default.

	health: Set dedicated config section and defaults
	Move health monitor config setting from 'devel' section
	into new 'health' section of its own.

	The health monitor config is set to enabled by default
	to report vdsm resources metrics and also log lvm cache
	stats which will be added to it on next patch.

	The default monitoring interval is increased from 60 to 300
	seconds not to flood the logs.

2020-04-16  Liran Rotenberg  <lrotenbe@redhat.com>

	virt: remove _dom abuse in migration
	The VM object holds _dom as a private parameter.
	We shouldn't use it directly from outside classes.

	This patch refactor migration module and VM class
	to remove this abuse.

2020-04-16  Eyal Shenitzky  <eshenitz@redhat.com>

	backup_test: add tests for backup with checkpoint creation

	fakedomainadapter: add fake checkpointLookupByName method
	FakeDomainAdapter should mock now checkpointLookupByName() to support
	tests for backup with checkpoints.

	backup: add checkpoint xml to start_backup and backup_info result
	Return the backup checkpoint XML in start_backup and backup_info calls.

	Result example:
	{
	    "disks": {...},
	    "checkpoint": "xml..."
	}

	The "checkpoint" key may be missing if:
	  1. There is no checkpoint for the backup - contains RAW disks only.
	  2. Failure while fetching the checkpoint XML from libvirt.

	backup: add checkpoint_id parameter to backup_info verb
	The checkpoint_id parameter is needed in order to fetch the backup
	checkpoint XML, will be implemented in a later patch.

	The checkpoint_id parameter is optional since there is an option to
	create a backup that contains RAW disks only and that backup will not
	include a checkpoint.

2020-04-16  Nir Soffer  <nsoffer@redhat.com>

	spec: Fix broken bug URL

2020-04-15  Ales Musil  <amusil@redhat.com>

	net, nmstate: DHCP monitoring
	Move nmstate DHCP monitoring to utilize NM dispatcher
	service. This adds the possibility to monitor any
	interface changes and add support for IPv6
	source routing.

	Bug-Url: https://bugzilla.redhat.com/1690485

	net, tests: Add general wait_for function
	Refactor the dhcpv4 handling post setup by extracting
	the "wait" functionality to its own function.
	The extracted function is targeted to be reused in
	following changes.

	Bug-Url: https://bugzilla.redhat.com/1690485

	net, nmstate: Add NM dispatcher script
	Add script that will be executed by NM
	on certain actions. This script is a part
	of the new DHCP monitoring.

	Inside the script check for new connections or
	dhcp address change and send any changes to monitor
	via unix socket.

	Bug-Url: https://bugzilla.redhat.com/1690485

	net: Add dhcp monitor module
	Add dhcp monitor that listens on unix socket
	for connection in preparation for the NM
	dispatcher service.

	In addition monitored item pool is added
	to keep track which interface is monitored.

	This monitor add the possibility to manage
	notifications from DHCP event without the need
	for files and inotify. The monitor itself is generic
	enough. It can be extended for any event if needed.

	Bug-Url: https://bugzilla.redhat.com/1690485

2020-04-14  Tomáš Golembiovský  <tgolembi@redhat.com>

	qga: get list of PCI devices and drivers
	Report PCI devices (identified by vendor and device ID) and associated drivers.
	This is currently implemented only in qemu-ga for Windows. Our intended use is
	to detect and notify user in case any of the drives could be updated from
	available virtio-win.iso.

	$ vdsm-client Host getAllVmStats
	[
	    {
	        "appsList": [
	            "qemu-guest-agent-101.1.0"
	        ],
	        "pci_devices": [
	            {
	                "device_id": 4096,
	                "driver_date": "2019-08-12",
	                "driver_name": "Red Hat VirtIO Ethernet Adapter",
	                "driver_version": "100.80.104.17300",
	                "vendor_id": 6900
	            },
	            {
	                "device_id": 256,
	                "driver_date": "2019-04-11",
	                "driver_name": "Red Hat QXL controller",
	                "driver_version": "10.0.0.19000",
	                "vendor_id": 6966
	            },
	            ...
	        ],
	        ...
	    }
	]

2020-04-14  Milan Zamazal  <mzamazal@redhat.com>

	virt: Fix encoding problems in the libvirt hook
	Migrations of VMs with non-ASCII characters in their names fail in the
	libvirt migration hook with errors such as:

	  xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 2, column 15

	The problem is that libvirt sets LC_ALL=C before calling the hook and
	that affects sys.stdin and sys.stdout.  Changing the locale later, in
	the Python script, doesn't help.  Then Python XML processing assumes
	ASCII input and complains on non-ASCII characters, possibly unless an
	XML declaration with an explicitly specified encoding is
	present (which is not the case with libvirt domain XMLs).

	Handling it on a Python level is quite tricky, there are at least the
	following problems:

	- sys.stdin must be converted to an UTF-8 stream using
	  io.TextIOWrapper to get the input encoding set correctly.
	- The resulting transformed tree fails in its `write' method due to
	  mixture of bytes and strings within it.  The tree must be exported
	  using ElementTree.tostring.
	- sys.stdout must be tricked by using io.TextIOWrapper too to avoid
	  complaints about non-ASCII characters.

	Rather than adding the complexity to the Python script, let's simply
	override LC_ALL and set it to C.UTF-8 in the script wrapper.  This
	shouldn't cause any problems and it makes Python XML processing happy
	when using non-ASCII characters.

	Bug-Url: https://bugzilla.redhat.com/1795206

	logging: Set UTF-8 encoding for file loggers
	If there is a log message containing non-ASCII characters then the log
	message is silently dropped.  An example of this problem is a VM with
	non-ASCII characters in its name.  Information such as the domain XML
	or errors citing the VM name is not logged.

	Let's pass `encoding' argument to WatchedFileHandler to fix that.

	Bug-Url: https://bugzilla.redhat.com/1795206

2020-04-14  Amit Bawer  <abawer@redhat.com>

	lvm: Refactor for cache stats property
	Move the stats property next to init for readability
	and use it from within the member methods in case
	lazy initialization will be introduced. [1]

	[1] https://gerrit.ovirt.org/#/c/108107/5/lib/vdsm/storage/lvm.py@897

2020-04-10  Ales Musil  <amusil@redhat.com>

	net: Fix source route change
	Remove source route every time gateway
	is specified. This way we can prevent issues
	with routes that were kept because of
	gateway change. Moving from dynamic
	to static caused the same issue.

	Bug-Url: https://bugzilla.redhat.com/1821309

2020-04-09  Nir Soffer  <nsoffer@redhat.com>

	automation: Install ovirt-imageio-common
	The tests depends only ovirt-imageio-common so we don't need to install
	ovirt-imageio-daemon in the slaves.

	Version 2.0.3 was released so we don't need to install 2.0.2.

2020-04-09  Michal Skrivanek  <michal.skrivanek@redhat.com>

	spec: require libvirt with live snapshot fix
	Fixing https://bugzilla.redhat.com/1820016, only in Advanced Virt 8.2

	Bug-Url: https://bugzilla.redhat.com/1820068

2020-04-09  Nir Soffer  <nsoffer@redhat.com>

	spec: Requires ioprocess 1.4.1
	This version fixed compatibility with gluster shard. Without this fix
	creating gluster storage domain may fail with:

	    FileNotFoundError: [Errno 2] No such file or directory

	Bug-Url: https://bugzilla.redhat.com/1820283

2020-04-08  Benny Zlotnik  <bzlotnik@redhat.com>

	qemuimg,tests: add format to qemuimg.compare
	As letting qemu-img probe the image format automatically
	is unsafe, this patch adds '-f' for the first image and
	'-F' for the second image

2020-04-08  Tomáš Golembiovský  <tgolembi@redhat.com>

	qga: drop partition number from device path
	What was returned from qemu-ga was inconsitent with what we did return
	with ovirt-guest agent. Instead of mapping disk devices to serial
	numbers we in fact mapped a partition. E.g. /dev/sda1 instead of
	/dev/sda.

	The current behavior did not make much sense enyway. Suppose there were
	two partitions with filesystem on the disk /dev/sda1 and /dev/sda2 then
	either of those could get associated with the disk serial number.

	Bug-Url: https://bugzilla.redhat.com/1793290

2020-04-07  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.13

2020-04-07  Benny Zlotnik  <bzlotnik@redhat.com>

	qemuimg: use -B to specify backing file without creation
	Options passed to `qemu-img convert` with '-o' are ignored
	when '-n' is passed. To ensure the backing file is not ignored,
	this patch uses the '-B' flag to specify the backing file.

	Bug-Url: https://bugzilla.redhat.com/1819125

	tests: add a failing test for qemuimg.convert
	When using `qemu-img convert` with the '-n' flag, options passed
	with '-o' are ignored. As a result the target image is created as
	if it has no backing file, and the entire data of the backing file
	is duplicated[1]. If the target is a block domain, the LV created
	will not even have enough space to hold the data from underlying
	chain.

	This patch adds a failing test to demonstrate the issue.

	[1]
	$ qemu-img create -f qcow2 base 10G
	Formatting 'base', fmt=qcow2 size=10737418240 cluster_size=65536
	lazy_refcounts=off refcount_bits=16

	$ qemu-io -c 'write 0 100M' base
	wrote 104857600/104857600 bytes at offset 0
	100 MiB, 1 ops; 00.28 sec (358.392 MiB/sec and 3.5839 ops/sec)

	$ qemu-img create -f qcow2 overlay2 -b base 10G
	Formatting 'overlay2', fmt=qcow2 size=10737418240 backing_file=base
	cluster_size=65536 lazy_refcounts=off refcount_bits=16

	$ qemu-img convert -p -t none -T none -n -f qcow2 overlay -O qcow2 overlay2

	$ qemu-img info overlay2
	image: overlay2
	file format: qcow2
	virtual size: 10 GiB (10737418240 bytes)
	disk size: 101 MiB
	as not data was written
	cluster_size: 65536
	backing file: base
	Format specific information:
	    compat: 1.1
	    lazy refcounts: false
	    refcount bits: 16
	    corrupt: false

	Bug-Url: https://bugzilla.redhat.com/1819125

	qemuimg: add qemuimg.compare
	Add qemuimg.compare, the method will create `qemu-img compare`
	on two images and will be used to tell if they are identical.
	The `strict` option will compare size and allocation as well
	as content.

	Bug-Url: https://bugzilla.redhat.com/1819125

2020-04-07  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.12

2020-04-06  Milan Zamazal  <mzamazal@redhat.com>

	api: Retrieve VM id from the XML on VM creation
	XML Engine doesn't send vmID parameter in VM.create API call anymore.
	That means that API.VM._UUID and the vmId (note the little identifier
	difference from vmID) parameters are None.

	One known problem with that is the checks for an already running VM in
	API.py and clientIF.py don't work -- they always pass because None key
	is never present in vmContainer.  If anything attempts to start an
	already running VM then Vdsm tries to start it again rather than
	rejecting the attempt early.  It will fail sooner or later, but the
	original Vm instance in vmContainer may get replaced with the failed
	Vm instance in the meantime, shadowing the original Vm instance.  Then
	the real VM will keep running untracked by Vdsm.

	Let's set API.VM._UUID and vmId from the domain XML in
	API.VM.create (other API.VM calls can still get vmId from Engine).

	Related-To: https://bugzilla.redhat.com/1816327

2020-04-06  Shani Leviim  <sleviim@redhat.com>

	blockVolume: parse slot into an int
	Before commit 6612cd40432f5864976fc5c6129031d64779a1f1,
	the 'slot' parameter was parsed as an int.
	Currently, it parsed as a string, which may cause an error:
	TypeError: unsupported operand type(s) for +: 'int' and 'str'.

	Bug-Url: https://bugzilla.redhat.com/1680368
	Bug-Url: https://bugzilla.redhat.com/1819098

2020-04-06  Eyal Shenitzky  <eshenitz@redhat.com>

	backup: create checkpoint when starting full vm backup
	The first step towards incremental backup is to create
	a checkpoint when the backup begins.
	The checkpoint will mark a point in time that instructs
	Libvirt and Qemu to track the diffs from this checkpoint
	to the next one in the chain.

	When the user starts a backup, a new checkpoint is
	added automatically.

	The checkpoint XML structure is as follows:

	<domaincheckpoint>
	    <name>uniqe_id</name>
	    <description>checkpoint for backup 'backup_id'</description>
	    <parent>
	        <name>parent_id</name>
	    </parent>
	    <disks>
	        <disk name='vda' checkpoint='bitmap'/>
	        <disk name='vdb' checkpoint='no'/>
	    </disks>
	</domaincheckpoint>

	A checkpoint is supported for Qcow2 disks only. Therefore,
	for raw disks, the checkpoint attribute will be set to 'no'
	instead of 'bitmap' according to the engine request.
	This will instruct Libvirt to skip the checkpoint creation for
	those disks.

2020-04-06  Nir Soffer  <nsoffer@redhat.com>

	docker: Remove non-existing targets
	The fedora-29 and centos-7 dockerfiles were removed long time ago, but
	the makefile was not updated.

	docker: Add ovirt-imageio-common
	Since we moved to ovirt-imageio 2.0, pylint fails in travis, since we
	did not add the package to the containers:

	lib/vdsm/kvm2ovirt.py:30:0: E0401: Unable to import 'ovirt_imageio' (import-error)

	Add the package to the images.

2020-04-06  Marcin Sobczyk  <msobczyk@redhat.com>

	tool: libvirt: Dynamic socket requirements
	Since libvirt switched to socket activation, one needs to enable
	its socket units to be able to connect with TCP/TLS.

	Until now, we simply added the requirements to socket units
	to supervdsm's service definition. This approach however had two flaws:
	- we required both TCP and TLS sockets at all times
	- it didn't guarantee that the socket units will be up before
	  'libvirtd.service' and socket units will fail to start if the daemon
	  is already running

	This patch changes the approach to a more dynamic one. During
	reconfiguration of libvirt module with 'vdsm-tool', depending on the
	configuration of vdsm (ssl/non-ssl), we inject a requirement to the
	appropriate socket unit on the 'libvirtd.service' itself with systemd.
	This is done by creating a symlink like:

	 /etc/systemd/system/libvirtd.service.requires/libvirtd-tls.socket --> /usr/lib/systemd/system/libvirtd-tls.socket

	This both gives us the guarantee, that during next start
	of 'libvirtd.service' the socket will also be up, and makes only
	TCP or TLS socket required.

	During deconfiguration we remove the symlinks.

	Since tests for all 'vdsm-tool' configurators are still written in nose,
	a significant amount of time will be needed to adjust them to the new
	way of working. This patch disables all currently failing tests and they
	will be reenabled later.

	Bug-Url: https://bugzilla.redhat.com/1818554

	tool: libvirt: Drop support for libvirt without socket activation
	We depend on libvirt versions that have socket activation for quite some
	time [1], but the support for non-socket activation versions is still
	available in vdsm-tool's libvirt configurator.

	This patch removes any leftovers of non-socket-actiavtion libvirt
	support.

	[1] https://github.com/oVirt/vdsm/commit/05902304ea21f7fcc7f5340b989c7cde8e49a849

	Bug-Url: https://bugzilla.redhat.com/1818554

	tool: libvirt: Remove deprecated libvirt settings
	We depend on libvirt versions that have socket activation for quite some
	time [1]. Per the documentation in '/etc/libvirt/libvirtd.conf',
	it seems we still provide some settings to libvirt's configuration files
	that are not respected when the socket activation is used. This patch
	drops these settings.

	[1] https://github.com/oVirt/vdsm/commit/05902304ea21f7fcc7f5340b989c7cde8e49a849

	Bug-Url: https://bugzilla.redhat.com/1818554

	tool: libvirt: Don't shutdown libvirtd on timeout
	By default 'libvirtd' starts with '--timeout 120' flag. The effect
	of this flag is that after 120 secs, if no client has made a connection
	to the daemon, it will shut itself down.

	This is the essence of socket activation feature. However given the
	problematic fact that libvirt's socket units cannot be started after
	the daemon is up and multiple scenarios where we juggle the services
	(i.e. host deployment process), it's better if we keep things simple
	and let 'libvirtd' stay up once it's up.

	Bug-Url: https://bugzilla.redhat.com/1818554

	systemd: Do not enable vdsmd after installation
	Straight after installation vdsm is not really able to function
	properly - it needs at least a 'vdsm-tool configure' run to function
	properly.

	Currently, when installing vdsm, we put a '85-vdsmd.preset' file
	inside systemd's preset directory that causes vdsm to be enabled
	by default - this patch removes this file. This will cause
	the distro-provided '99-default-disable.preset' preset to be in effect,
	which disables everything by default.

	Bug-Url: https://bugzilla.redhat.com/1818554

2020-04-03  Nir Soffer  <nsoffer@redhat.com>

	lvm: Move _isMetadataPv() to PV.is_metadata_pv()
	Now that we have a class, we can group related functions to the class.

	lvm: Rename make{PV,VG,LV}() to {PV,VG,LV}.fromlvm()
	The code is located where it should now. More work is needed to move
	parsing and validation code from _reload{pvs,vgs,lvs}().

	lvm: Get rid of the isinstance checks
	Implement is_stale() in all cache items classes, so we can replace the
	type checks:

	    if isinstance(obj, Stale):

	with proper code:

	    if obj.is_stale():

	lvm: Rename Stub to Stale
	This namedtuple is used as a placeholder when we invalidate an item in
	the cache. Rename to make the intent of this class clear.

	lvm: Remove unused "stale" field
	The Stub namedtuple is always created with stale=True, and the field is
	never accessed; remove it.

2020-04-02  Ales Musil  <amusil@redhat.com>

	net: Gracefully shutdown the dhcp monitor on vdsm stop

2020-04-01  Ales Musil  <amusil@redhat.com>

	virt, net: Wait for link up after hostdev reattach
	Some sriov drivers do operations asynchronously
	and on hotunplug libvirt sends an event to confirm
	the operation. However, subsequent steps are taken
	by the kernel until the pci device is visible.
	Which results in wrongly reported refresh caps.

	Wait for the pci device link to be actually up before
	returning from operation.

	Bug-Url: https://bugzilla.redhat.com/1817001

	net: Add pci link up monitor
	Bug-Url: https://bugzilla.redhat.com/1817001

2020-04-01  Amit Bawer  <abawer@redhat.com>

	lvm: Move lvm cache stats into a class of its own

	lvm: Fix over-indent in deactivateUnusedLVs

2020-04-01  Nir Soffer  <nsoffer@redhat.com>

	travis: Modernize bulid
	Remove the "sudo: required" settings since it is not required any more
	and has no effect.

	Replace env:matrix with jobs, which is simpler, more powerful, and
	matches better the way we use travis.

2020-03-31  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.11

2020-03-31  Vojtech Juranek  <vjuranek@redhat.com>

	rpm: require imageio 2.0
	New major version of imageio, imageio 2.0, introduces couple of
	important changes. One of the them is to get rid of hardcoded dependency
	on vdsm. imageio now runs under it's own user, doens't provide any
	default deamon config file with TLS set up to use vdsm certificates and
	create socket for ticket in it's own directory. Also refactoring of
	imageio was done in new version, so import from imageio need to be
	updated.

	After switching to imageio 2.0 we need to:
	* add imageio into kvm group, so that it can read/write to nbd socket
	* add imageio into qemu group, so that it can read/write images, which
	  are owned by qemu
	* add vdsm into imageio group, so that it can read/write imageio control
	  socket
	* provide our own configuration of imageio daemon
	* fix imageio socket path in imagetickets module
	* fix kvm2ovirt module to use proper import
	* update imageio service name in vdsm service file

	Include default imageio daemon confing file and install it during vdsm
	installation. Also include imageio and vdsm into relevant groups as
	described above. Reflect changes described above in affected vdsm
	modules.

2020-03-31  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.10

2020-03-31  Ales Musil  <amusil@redhat.com>

	net, nmstate: Remove 3 DNS entries workaround
	Since nmstate 0.2.9 we can use 3 DNS entries.

	Bug-Url: https://bugzilla.redhat.com/1816043

2020-03-31  Artur Socha  <asocha@redhat.com>

	betterAsyncore: Proper handling of SSL_ERROR_WANT_WRITE
	This patch provides a fix for scenario when there is a 'huge'
	Host.getCapabilities response over SSL.
	In testing scenario it was only ~200kb to trigger this error.
	Previously, only part of the data used to be written
	over the socket just before connection was closed because of unhandled
	ssl.SSLWantWriteError

	The issue has been described at [1] and [2]

	Additionally, order of exception handlers has been corrected so that
	'sslUtils.SSLError' won't get swallowed. 'OSError' is superclass of
	'sslUtils.SSLError' and 'socket.error' is an alias for 'OSError'

	$ python3
	>>> import socket
	>>> assert socket.error is OSError

	[1] https://bugs.python.org/issue33307
	[2] https://docs.python.org/3/library/ssl.html#ssl-nonblocking

	Bug-Url: https://bugzilla.redhat.com/1766193

2020-03-30  Nir Soffer  <nsoffer@redhat.com>

	build: Remove --disable-ovirt-imageio option
	This option was added when vdsm was in Fedora, and ovirt-imageio
	packages were not available. Vdsm is not part of Fedora now, and is not
	an optional component, so we don't need this code.

2020-03-30  Amit Bawer  <abawer@redhat.com>

	lvm: Refactor getLv()
	1. Update comments.

	2. Return the resulting lv/lvs per case instead of a common res value.

	lvm: Use _lvs_needs_reload helper for reloadlvs check in getLv(vgName)
	This also fixes wrong check for any lv stub not belonging to vg
	in query.

	lvm: Add lvm cache statistics
	Log lvm cache hit ratio percentage = 100 * hits/(misses+hits):

	INFO     storage.LVM:lvm.py:912 Cache hit ratio: 22.29% (hits: 35 calls 157)

	Stats are printed periodically within BlockSD.selftest()
	(defult is 5 minutes) and from cleanup of tmp_storage fixture.

	This will also serve tests checking for hits and misses.

2020-03-30  Liran Rotenberg  <lrotenbe@redhat.com>

	virt: add safety check on md
	When cleaning the VM's metadata from
	the snapshot job, there is a possibility that
	the snapshot data already been cleaned up by
	another flow and thread. Then, on a specific bad
	timing, it will raise KeyError.

	If the metadata is missing, that means the job is
	done and cleaned up. No other usage is expected for it.
	Therefore, catching it as a KeyError and do nothing is
	good handling for this.

2020-03-30  Kaustav Majumder  <kmajumde@redhat.com>

	gluster: Added 'libblockdev-plugins-all' as dependency for gluster-vdsm
	Currently in python-blivet packages 'libblockdev-plugins-all' is missing
	which is required by gluster-mgmt.
	Bug-Url: https://bugzilla.redhat.com/1810910

2020-03-30  Ales Musil  <amusil@redhat.com>

	net, tests, nmstate: Remove xfail from dynamic IP with static DNS
	This should be fixed since nmstate 0.2.9.

	Depends-On: https://bugzilla.redhat.com/1815112

2020-03-29  Eyal Shenitzky  <eshenitz@redhat.com>

	backup: add 'checkpoint' property for each backup disk
	Add 'checkpoint' = true/false for each disk that participates
	in a VM backup.
	This information will guide VDSM on which disk should be included in the
	checkpoint creation for incremental VM backup.

	The disk will only be included in the checkpoint if the disk format is
	'cow', we cannot add bitmaps to raw disks.

	So now the request data from Engine to VDSM contains the following
	structure:
	{
	    backup_id: backup-1,
	    disks: [
	        {
	            "img_id": "disk1",,
	            "domain_id": "domain1",
	            "volume_id": "volume1",
	            "checkpoint": true/false,
	        },
	        {
	            "img_id": "disk2",,
	            "domain_id": "domain2",
	            "volume_id": "volume2",
	            "checkpoint": true/false,
	        },
	    from_checkpoint_id: "from_checkpoint_id",
	    to_checkpoint_id: "to_checkpoint_id"
	}

	The 'checkpoint' property will be used in the followup patch.

2020-03-27  Milan Zamazal  <mzamazal@redhat.com>

	virt: Use libvirt migration parameter for host name validation
	Encrypted migrations check that the destination certificate matches
	the migration destination.  However, this is not always true, for
	instance when using a migration network IP address as destination, not
	present in the destination host certificate.  We had to implement a
	workaround that always uses the standard host name or IP address as
	the encrypted migration destination, ignoring migration networks or
	other settings.  Otherwise encrypted migrations would fail completely
	in such situations.

	libvirt provides a new migration parameter,
	VIR_MIGRATE_PARAM_TLS_DESTINATION, now that can be used to set the
	expected host name in the destination certificate.  Let's use it and
	remove the former workaround.

	Bug-Url: https://bugzilla.redhat.com/1739557

2020-03-26  Tomáš Golembiovský  <tgolembi@redhat.com>

	qga: add libvirt backend
	From libvirt 5.9.0 there is new API for querying QEMU Guest Agent. Let's
	make use of it.

	Bug-Url: https://bugzilla.redhat.com/1680398

	qga: do not query caps in every iteration
	When the VM is running and we should honor 'qga_info_period'. Only when
	the VM has just booted it makes sense to ignore that and check more
	often until we see the agent.

	Bug-Url: https://bugzilla.redhat.com/1680398

	qga: return default dict with capabilities
	We used to return None if the VM was not yet checked or when the ID was
	invalid. But the caller could not reliably distinguish the situations
	making the behavior pretty useless. It also complicates some conditions
	unnecessarily. Let's return a dictionary in all cases (even for invalid
	VM IDs). The caller should still check the version for None to see
	whether qemu-ga was reached or not yet.

	qga: don't use private _dom attribute

	qga: rewrite poller
	Huge rewrite of the poller. The original version has several design
	flaws. It took too long for the initial polling of the started VM to
	happen, meaning the VM could spend several minutes without any stats
	reported to the engine. This is not something that could be easily fixed
	with the old code. Another problem was caused by the threading. The fact
	that the queries were grouped by qemu-ga calls caused congestions on a
	VM. One call on a VM caused calls from other threads to fail for the
	same VM.

	All the various Operations were merged into single Operation that is
	scheduled often (every 5 seconds in default configuration). All the
	qemu-ga commands for all VMs are queried from this operation.

	Bug-Url: https://bugzilla.redhat.com/1680398

2020-03-26  Benny Zlotnik  <bzlotnik@redhat.com>

	image,blockSD,fileSD: add is_block()
	is_block will provide a cleaner way to check the
	type of the storage domain

2020-03-26  Marcin Sobczyk  <msobczyk@redhat.com>

	tox: Remove usage of non-existent 'xpass' mark
	An non-existent 'xpass' mark was used to annotate two tests.
	This patch replaces its usage with TODOs.

2020-03-26  Nir Soffer  <nsoffer@redhat.com>

	api: Use safe yaml loader
	YAML default loader is unsafe, supporting misfeatures like arbitrary
	code execution[1]. The module provides now SafeLoader and CSafeLoader
	avoiding these issues.

	[1] https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation

2020-03-26  Bell Levin  <blevin@redhat.com>

	net, tests: Remove network markers
	Global markers were added in the tox.ini file instead.

2020-03-25  Vojtech Juranek  <vjuranek@redhat.com>

	nbd: make nbd socket writable for whole group
	In the followup patches we will switch to new imageio version. New
	imageio runs under its own user, which will be part of vdsm group.
	However, by default, socket are writable only by the user. As imageio
	needs also write access to nbd sockets, make nbd sockets writable by
	whole vdsm group. Set the mode to 0o660 as there's no need to make the
	socket readable by all other users.

2020-03-25  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.9

2020-03-25  Tomáš Golembiovský  <tgolembi@redhat.com>

	v2v: make regular expressions raw strings

2020-03-25  Benny Zlotnik  <bzlotnik@redhat.com>

	storage,tests: add -n flag when copying to block SD
	On new qemu-img (>= 4.2.0-15), converting will fail when
	it tries to create the image before copying with:
	"qemu-img: Protocol driver 'host_device' does not support image creation"

	This patch adds the "-n" flag when copying to a block SD
	since we create the image before anyway.

	Related-To: https://bugzilla.redhat.com/1816007
	Bug-Url: https://bugzilla.redhat.com/1816004

2020-03-25  Ales Musil  <amusil@redhat.com>

	net, nmstate: Fix dynamic with static DNS state
	Send auto-dns false when we require static DNS with
	dynamic IP configuration.

	Bug-Url: https://bugzilla.redhat.com/1812914

2020-03-25  Kaustav Majumder  <kmajumde@redhat.com>

	gluster: Fixed paramter types in volumeSetHelpXml()

2020-03-24  Amit Bawer  <abawer@redhat.com>

	lvm: Remove redundant warning about non-found lv in getLv()
	Remove redundant log warning about missing lv in result
	which is handled by the calling lvm.getLV() check later
	by raising an exception.

	Revert "lvm_test: Add reload lvs tests"
	This reverts commit 7e9fbcfce0e6feb301c1209e9d7a64870bac616f.

	Reload lvs tests will be added to the patch enabling caching
	for lvs.

	For now they are unrequired.

2020-03-23  Amit Bawer  <abawer@redhat.com>

	lvm_test: Add reload lvs tests
	- test_lv_reload_fresh_vg:

	Check that only vgs with stub lvs invoke reloadlvs for getLv.
	This currently fails for current lvm cache decision for
	reloadlvs, so marked as xfail to be fixed in next patch.

	- test_lv_reload_for_stale_vg:

	Check that reloadlvs is called for a vg when not in freshlv
	for getLv.

2020-03-23  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.8

2020-03-23  Bell Levin  <blevin@redhat.com>

	py3: Register pytest marks
	Pytest marks should from now on be registered, otherwise throw
	the following warning:
	 /usr/local/lib/python3.6/site-packages/_pytest/mark/structures.py:327:
	 PytestUnknownMarkWarning: Unknown pytest.mark.nmstate - is this a typo?
	 You can register custom marks to avoid this warning - for details,
	 see https://docs.pytest.org/en/latest/mark.html
	 PytestUnknownMarkWarning,

	Using a --strict flag in this config will ensure no
	new unregistered markers can be added.

2020-03-23  Nir Soffer  <nsoffer@redhat.com>

	spec: Include epoch in requirement
	yum can fail silently to resolve dependencies if a package species
	Epoch, and the Epoch is not specified in the requirement.

	Here is an example:

	    $ yum info lvm2
	    Installed Packages
	    Name         : lvm2
	    Epoch        : 8
	    Version      : 2.03.08
	    Release      : 2.el8
	    ...

	To require specific version of this package we must use:

	    Requires: lvm2 >= 8:2.03.08-2.el8

	Fix packages requirement to include Epoch.

	Related-To: https://bugzilla.redhat.com/1814979

	spec: Require ovirt-imageio 1.6.3
	ovirt-imageio 2.0 is not compatible with current vdsm. Require latest
	release compatible with vdsm.

2020-03-23  Ales Musil  <amusil@redhat.com>

	net, nmstate: Fix 3 DNS servers problem
	Add temporary workaround that we won't fail
	on switching from dynamic to static with 3 DNS.

2020-03-23  Bell Levin  <blevin@redhat.com>

	net tests: Fix invalid bridge option write
	Using nmstate will raise a setupNetwork error if a non existant
	option is attempted to be written.

	Currently, as the network scripts are still used, adding an invalid
	option will raise an AssertionError.
	Setting up the network succeeds, since ifup is not able to write
	the option, but does not fail as well. The assertion of the
	running config vs the requested config is the one to eventually
	fail, since the original option was not written in the first place.

	Although nmstate is on the brink of replacing ifcfg scripts,
	tracking this behavior of network-scripts is a good idea.

	net, tests: Remove bridge_opts test from ovs run
	Bridge options are not supported for ovs.

2020-03-19  Bell Levin  <blevin@redhat.com>

	net, tests: Add borken_on_travis decorator
	The following patch [1] introduced a test that includes ipv6,
	which is broken on travis [2].

	[1] https://gerrit.ovirt.org/#/c/106949/
	[2] https://github.com/travis-ci/travis-ci/issues/8361

2020-03-19  Vojtech Juranek  <vjuranek@redhat.com>

	blockSD: Storage domain life cycle management
	This patch introduces new life cycle methods to storage domain:

	- setup         called when storage domain monitor produces the storage
	                domain object, before starting to monitor the domain.

	- teardown      called when a storage domain monitor has finished
	                monitoring.

	The BlockStorageDomain implementation deactivates unused LVs both in
	setup and teardown to avoid stale devices.

	Some LVs can be activated by default, e.g. on FC during the boot, which
	would result into many stale devices, slowing down lvm operations.
	This can be solved by setting up LVM filter, but filter setup is
	optional, we deactive stale LVs when we start the domain.

	When a storage domain is deactivated, we currently don't deactivate LVs,
	leaving around stale devices. These LVs must be deactivated by vdsm,
	as e.g. after removing the domain, we have no easy means how to remove
	them as backing physical volumes can be already removed. So we need
	deactivate LVs also when deactivating the domain.

	This patch is rebase of the patch #56876 (https://gerrit.ovirt.org/56876)

	Bug-Url: https://bugzilla.redhat.com/1544370

2020-03-19  Amit Bawer  <abawer@redhat.com>

	lvm_test: Add test_get_lvs_after_sd_refresh
	Test that cache reload for one VG's LVs doesn't
	interfere with anothers.

2020-03-18  Michal Skrivanek  <michal.skrivanek@redhat.com>

	spec: update Advanced Virt requirements
	Bug-Url: https://bugzilla.redhat.com/1811425

2020-03-18  Marcin Sobczyk  <msobczyk@redhat.com>

	tests: Disable more tests that fail randomly in CI

2020-03-17  Liran Rotenberg  <lrotenbe@redhat.com>

	virt: async snapshot timeout abort
	This patch will let VDSM to abort the async
	snapshot operation.
	When the snapshot job will start, a start time
	value will be saved to the VM's metadata.
	A new thread with the purpose to abort the running job
	will check the timeout. It will run in both cases of
	the job: regular flow or recovery with the
	initial timeout.
	The default timeout of the snapshot job set
	to 30 minutes.
	The new thread abort will send abort command to libvirt
	job, resulting in ActionStopped VDSM error.

	The snapshot metadata is created in the Job itself,
	then the shared metadata passed to:
	- Snapshot class or Recovery class alternately
	- Abort class
	The abort class uses only completed/abort variables
	from the metadata.
	A lock for the Job's module exists to read and write
	the metadata to the VM.
	The main purpose of the lock is to prevent from
	racing on the metadata, in particular the abort
	and success status that can be changed by parallel
	running threads.

	This functional addition is needed since the VM
	will be locked by the engine when the snapshot
	job will run.
	In version 4.4 the job is async, and we want
	a finite time to the job, to release the locking.

	Bug-Url: https://bugzilla.redhat.com/1780943

	snapshot: refactor class attributes
	Changed class attributes to private

	virt: refactor dom abortJob
	Removing the abuse of _dom.abortJob()

2020-03-17  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.7

2020-03-17  Nir Soffer  <nsoffer@redhat.com>

	lvm: Filter LVs by VG name
	When we try to mark unreadable LVs on failure, we iterated on all LVs,
	and checked if the LV is a stub using:

	    for lvName in lvNames:
	        if isinstance(self._lvs.get((vgName, lvName)), Stub):

	Change lvNames generator so it return only the LVs belonging to
	specified VG name, so we save the dictionary lookup and isinstance check
	for LVs which are not part of this relevant VG.

2020-03-17  Marcin Sobczyk  <msobczyk@redhat.com>

	tool: Log stdout and stderr as str in exceptions
	'ServiceError' class takes stdout and stderr byte streams from arbitrary
	commands as its initialization arguments. In the string-conversion
	method these are incorrectly treated as strings. Since we cannot
	be sure about the contents of the stdout and stderr streams it's better
	to log them as string representation of bytes objects that try to decode
	them to string.

	Bug-Url: https://bugzilla.redhat.com/1813961

2020-03-17  Bell Levin  <blevin@redhat.com>

	net, tests: move route_to_device test to pytest

	net, tests: Move globals out of the test scope
	For future use in other tests.

2020-03-17  Marcin Sobczyk  <msobczyk@redhat.com>

	abrt: Fix generating core dumps on el8
	Core dump generation is broken on el8. The cause behind the issue
	has been described in one of Nir's patches [1]. The patch fixed
	the problem on el8 partially - we got rid of bad core_pattern
	that vdsm was applying.

	By default on el8 abrt tries to use the systemd's builtin core dump
	generation mechanism instead of using abrt-ccpp addon. While
	in the long term this is the way we should choose, currently abrt
	documentation mentions one very problematic aspect of using
	abrt + systemd duo [2]:

	> The current version of abrt-journal-core.service needs to make copies
	> of data that ABRT needs to be able to open a report in a bug tracking
	> tool and leaves the systemd-coredumpctl data untouched. That means
	> that you cannot use the ABRT journal core service to clean
	> systemd-coredumpctl and you will end up having two copies of core
	> files, one in systemd-coredumpctl’s storage and one
	> in a sub-directory of /var/spool/abrt/.

	Hopefully abrt will implement a way to avoid the double-core issue.
	For now it's safer to stick with the known and tried abrt-ccpp addon.
	This is why we're adding 'abrt-ccpp.service' dependency to vdsm.

	Fedoras use newer versions of abrt than el8 and there's no abrt-ccpp
	addon no more - the only possible way of generating core dumps there
	is using systemd's coredumpctl. This is why the service dependency
	is a soft 'Wants=' one.

	Since systemd can be quite persistive in changing the core_pattern
	back to its own 'systemd-coredump', per sysctl.d manual [3] we also
	need to mask the '/usr/lib/sysctl.d/50-coredump.conf' setting file
	by creating a '/etc/sysctl.d/50-coredump.conf' symlink pointing
	to '/dev/null'.

	[1] https://gerrit.ovirt.org/#/c/106048/
	[2] https://abrt.readthedocs.io/en/latest/examples.html#getting-core-files-from-systemd-coredumctl
	[3] http://man7.org/linux/man-pages/man5/sysctl.d.5.html

	Bug-Url: https://bugzilla.redhat.com/1787222

2020-03-17  Nir Soffer  <nsoffer@redhat.com>

	init: Enable coredumps
	Vdsm is overriding /proc/sys/kernel/core_pattern during startup to:

	    |/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t %e %i"

	But /usr/libexec/abrt-hook-ccpp is not installed, since Fedora 26[1].
	When when a program crashes, the coredump is dropped.

	This was added in this commit:

	commit 893ac2a4d610791e26f6debdab8b06f8c36bc18d
	Author: Yeela Kaplan <ykaplan@redhat.com>
	Date:   Mon Jul 6 18:27:47 2015 +0300

	    Adding abrt dependency and introduce configurator for it

	This is wrong; configuring core_pattern is done using
	/usr/lib/sysctl/*.conf, and on Fedora it configured by:

	$ cat /usr/lib/sysctl.d/50-coredump.conf
	...
	kernel.core_pattern=|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e

	Removing the bad configuration restore core dumps.

	Since coredumps are managed by coredumpctl instead of abrt-addon-ccpp,
	we need to remove abrt-ccpp configuration, and possibly configure
	coredumpctl to match our policy. Lets start with enable coredumps since
	this currently blocks incremental backup work.

	[1] https://fedoraproject.org/wiki/Changes/coredumpctl

	Bug-Url: https://bugzilla.redhat.com/1787222

2020-03-16  Nir Soffer  <nsoffer@redhat.com>

	tox: Increase storage required coverage

2020-03-15  Nir Soffer  <nsoffer@redhat.com>

	lvm: Fix _reloadlvs() return value on errors
	If the underling lvs command fails, return only the updated LVs, instead
	of the entire LVs cache. Copying all LVs from All VGs is incorrect and
	wasteful.

	The issue is hidden in LVMCache.getLv(), extracting only the specified
	LV from _reloadlvs() return value.

	Bug-Url: https://bugzilla.redhat.com/1553133

	lvm: Fix _reloadlvs() error handling
	If lvName argument was specifed, and the underlying lvs command failed,
	we tried to mark stub LVs as unreadable. However we looked up the LVs
	using the LV name instead of (VG name, LV name):

	    if isinstance(lvName, self._lvs.get(lvName), Stub):

	This will always return None, so we did not mark anything as unreadable.

	If lvName argument was not specified, _reloadlvs iterated on all LVs in
	the cache - including LVs from other VGs, and marked all stub lvs as
	unreadable.

	Both issues were introduced in the patch adding the Unreadable class:

	commit 344a8c7d5d8a0e56f866385dd9295ea5f2dd1af0
	Author: Eduardo Warszawski <ewarszaw@redhat.com>
	Date:   Wed Apr 6 19:28:24 2011 +0300

	    BZ#684576 - Added Unreadable class for lvm objects that can't be
	    reloaded.

	Fixed both issues by using specified VG and LV names when looking
	up stub LVs.

	Bug-Url: https://bugzilla.redhat.com/1553133

	tests: Add tests for error handling in _reloadlvs()
	Add missing tests for handling underlying "lvs" command errors in
	_reloadlvs().

	Tests for affecting only the specified vg are marked as xfail, since the
	code currently mark all lvs as unreadable.

	This change increases lvm module test coverage from 77% to 80%.

	Bug-Url: https://bugzilla.redhat.com/1553133

	lvm: Fix _reloadpvs() return value on errors
	If the underlying pvs command failed, we returned the a dict with all
	the PVs. If the command succeeded, we returned a dict with the updated
	PVs. Now we always return a dict with the updated PVs.

	Bug-Url: https://bugzilla.redhat.com/1553133

	lvm: Fix _reloadvgs() return value on errors
	When the underlying vgs command failed, we marked the relevant vgs as
	unreadable, but we returned the updated VGs, which never include the
	unreadable vgs.

	Now we add the unreadable VGs to the updated VGs, so the return value
	matches the cache contents.

	Bug-Url: https://bugzilla.redhat.com/1553133

	lvm: Fix handling of vgs with empty output
	Before moving to --select, vgs with single VG name would fail if the VG
	does not exist, marking the VG as Unreadable. This was kind of a bug but
	we could not detect that the VG is missing, since lvm return the same
	generic error code for all errors.

	Now that we use --select, the command does not fail, but we get empty
	output. In this case we return the current cache contents, possibly
	returning a stale data. Because we never call _reloadvgs() if the VG was
	not stale, the VG is mark as Stub. A Stub and Unreadable are basically
	the same, but really want to remove non-existing items from the cache.

	Fixed by removing the check for empty output, so we always remove stale
	vgs. This fixes getVG() so it does not return stubs for missing VG after
	reload.

	We still process partial output from vgs, so the cache is updated with
	the data we have.

	Bug-Url: https://bugzilla.redhat.com/1553133

	lvm: Fix getAllPvs() and getAllVss()
	getAllPvs() and getAllVgs() were returning stubs for missing items which
	are not in the cache. Remove the stubs from the returned value.

	Bug-Url: https://bugzilla.redhat.com/1553133

	tests: Tests handling of stale cache items
	Add missing tests for handling of stale cache items. This simulates the
	case when PV, VG, or LV are removed on another host.

	The tests handle both kinds of invalidation:
	- Dropping entire cache using lvm.invalidateCache()
	- Invalidating single VG and optionally its LVs and PVs using
	  lvm.invalidateVG()

	Older test for using the cache in lvm.getLV() updated to use new
	stale_lv() fixture, and is testing now both lvm.invalidateVG() and
	lvm.invalidateCache(). The tests for getting all LVS using the cache
	is marked as xfail since we don't use the cache yet.

	Some tests are marked as xfail since getAllVGs(), getAllPVs(), and
	_reloadvgs() may return stubs.

	Bug-Url: https://bugzilla.redhat.com/1553133

	lvm: Fix lvm.invalidateVG()
	Instead of reloading the VG to find the PVs, iterate over the PVs in the
	cache and find all PVs which are part of the VG. This is the same way we
	invalidate LVs.

	The issue was introduced in when the invalidatePVs option was added:

	commit 2f494ab1fdeba3aefcff1ccada47cc6b02f9eb96
	Author: Liron Aravot <laravot@redhat.com>
	Date:   Sun Jan 22 19:46:15 2017 +0200

	    getVGInfo - updated pv information

	The only flow using invalidatePVs=True is LVMVolumeGroup.getInfo().

	Bug-Url: https://bugzilla.redhat.com/1553133

	tests: Add tests for lvm.invalidateVG()
	Add missing tests for invalidating VG and optionally it's LVs and PVs.

	The test for invalidating PVs fails since we reload the VG to find the
	VG to find the PV names. This may cause failures if the VG is not ready
	when it is invalidated, and prevents using lvm.invalidateVG() in the
	tests to test handling of stale items.

	This change increases lvm module test coverage from 75% to 77%.

	Bug-Url: https://bugzilla.redhat.com/1553133

	lvm: Avoid bogus warnings about removing stale items
	When querying a non-existing item, for example during storage domain
	lookup, the _reload functions could log a bogus warning about removing a
	stale item:

	   Removing stale VG a033f14a-6830-4876-be7e-bd2fa2973f07

	Now we warn about removing stale item only if the item was in the cache.

	Bug-Url: https://bugzilla.redhat.com/1553133

2020-03-15  Amit Bawer  <abawer@redhat.com>

	Revert "lvm: Mark stalelv = False when done with reloadlvs operation for all lvs"
	This reverts commit 92765ef491136736310f31195c2498a3ddc7ac96.

	Reverted commit added stalelv = False update to a successful completion
	of a reloadlvs operation in lvm cache. This has improved the cahce
	performance of subsequent calls for retrieving different lvs of the
	same vg, however it has also introduced a regression where reloading all
	lvs of a specific vg would keep the stalelv indicator as False until
	the next cache refresh (i.e. storage refresh), causing the lvm cache to
	skip any lvs reload for another vg in between.


	Bug-Url: https://bugzilla.redhat.com/1808850
	Related-To: https://bugzilla.redhat.com/1810143

2020-03-13  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.6

2020-03-12  Ales Musil  <amusil@redhat.com>

	tests: Skip testMethodMissingMethod randomly failing on CI

	net, tests: Replace env variables for functional tests
	Replace env variables with command line arguments.
	Change default functional test to nmstate.

	net, nmstate, tests: Add possibility to run test over nmstate from source
	User can now specify nmstate PR id via
	--nmstate-pr command line argument for PR.
	--nmstate-source path for the nmstate local source.
	Which will install the nmstate version from source.

	This can speed up a proccess for testing yet
	unmerged/unavailable things and is helpful for
	nmstate to run vdsm functional tests.

	net, nmstate, tests: Support command line arguments
	Add support for command line arguments in run-tests for
	functional tests.

	net, nmstate, tests: Update DNS inside container
	DNS was forced at container start, this would break
	the container connection with rest of the internet.
	Change it in different step so we can use internet
	access in the script.

2020-03-12  Pavel Bar  <pbar@redhat.com>

	storage: Fixing outOfProcess.fileUtils.createdir (umask)
	- Fixing a bug in the implementation of
	"outOfProcess.fileUtils.createdir":
	take into account the umask when the directory
	that we are trying to create already exists.
	- Removing "xfail" from the test that now passes.

	storage: Fixing outOfProcess.fileUtils.createdir (check isfile)
	- Fixing a bug in the implementation of
	"outOfProcess.fileUtils.createdir":
	raise an "OSError" exception when the directory that we
	are trying to create is actually an existing file.
	- Removing "xfail" from the tests that now pass.

	tests: Improving of outOfProcess.fileUtils.createdir tests
	Additional work on "outOfProcess.fileUtils.createdir":
	- Some code refactoring of the tests.
	- Adding new test flows.
	- Improving "xfail" reason strings and comments.

2020-03-11  Pavel Bar  <pbar@redhat.com>

	storage: Fixing fileUtils.createdir (umask)
	- Fixing a bug in the implementation of
	"fileUtils.createdir":
	take into account the umask when the directory
	that we are trying to create already exists.
	- Removing "xfail" from the test that now passes.

	tests: Adding a new test for fileUtils.createdir
	- New test scenario - the directory that we are
	trying to create already exists.
	- Currently fails due to bug in "fileUtils.createdir()",
	the mode comparison doesn't take into account the umask.
	- Requetsed by Nir in:
	https://gerrit.ovirt.org/#/c/107037/2/lib/vdsm/storage/outOfProcess.py@181

	tests: Improving fileutil tests
	- Adding octal representation for mode asserts.
	- 1 more check that the path is a directory.
	- Some local variables renaming for readability.

	tests: Adding verifications for specific error numbers
	- Added assert statements when expecting "OSError" exception
	for specific error types (value.errno).

2020-03-11  Nir Soffer  <nsoffer@redhat.com>

	lvm: Log errors when reloading fail
	Previously when LVM command failed, we did not know if this is a real
	error (cannot read existing VG) or an expected condition (checking if a
	VG exists when creating new storage domain).

	Now LVM command will never fail in this case, so any failure is a real
	error and must be logged as an error.

	ug-Url: https://bugzilla.redhat.com/1553133

	lvm: Use --select to eliminate errors
	We used to reload pvs, vgs and lvs by specifying the entity in the
	command line. Here are few examples:

	    pvs pv-1 pv-2

	    vgs vg-1

	    lvs vg-1

	    lvs vg-1/lv-1 vg-1/lv-2

	If these commands cannot find one of the specified entities, they fail.
	When a command fail, we invalidated the filter, and if the new filter is
	different, retry the command with the wider filter.

	Since we want to retry failing read-only command, these commands would
	be retried 10 times in read-only mode. If the entity is missing, all
	retries will fail, slowing down the system without any benefit.

	To avoid this issue, change reload commands to use --select. When using
	select, we can specify which entity we want to query. If the entity does
	not exist, the command does not fail. Here are examples of the same
	commands using --select:

	    pvs --select 'pv_name = pv-1 || pv_name = pv-2'

	    vgs --select 'vg_name = vg-1'

	    lvs --select 'vg_name = vg-1'

	    lvs --select 'vg_name = vg-1 && (lv_name = lv-1 || lv_name = lv-2)'

	Using --select introduces a new problem, since we depend on command
	failure to refresh a stale filter. Fix this by refreshing the filter
	also when a command wants output, but no output was received.

	If a command tries to query multiple entities and one of them is missing
	(--select 'vg_name = x || vg_name == y'), the command will not fail and
	invalidate the filter. The filter will be invalidated in the next domain
	monitor refresh (once every 5 minutes) or when SDCache.refreshStorage()
	is called.

	Bug-Url: https://bugzilla.redhat.com/1553133

	tests: Use read-write mode for pvs command
	pvs is broken when using locking_type=4 without specifying devices[1].
	Change lvm to read-write mode when using lvm.getPV() to workaround this
	issue.

	This can be reverted once we move to Fedora 31, using lvm2 2.03, which
	does not need locking_type=4.

	[1] https://bugzilla.redhat.com/1809660

	lvm: Always remove stale VGs and PVs
	Previously we removed stale VGs and PVs only when running "vgs" or "pvs"
	without specifying the VG or PV names. This was correct since when the
	names are specified, a missing VG or PV cause the LVM command to fail,
	which drops all VGs or PVs, marking all cached items as Unreadable.

	We want to start using --select so LVM commands will not fail when a VG
	or PV are missing. In this case we have to remove the missing items from
	the cache.

	ug-Url: https://bugzilla.redhat.com/1553133

2020-03-11  Milan Zamazal  <mzamazal@redhat.com>

	hostdev: Fix a bytes x str issue in udev mapping retrieval
	commands.run returns bytes, while we want to report str to Engine.
	Let's convert commands.run output before processing it.

2020-03-10  Pavel Bar  <pbar@redhat.com>

	storage: Replace get_umask with a thread-safe version
	The old code was not thread-safe since the implementation
	of "get_umask()" was not a single atomic operation.
	It included 2 operations - read & write:
	1) Get the old umask (and temporary write a new one instead).
	2) Change back to the old umask.
	This is a classic example of a non thread-safe flow
	when 2 threads might interleave and the result will be
	unexpected and inconsistent.
	The "get_umask()" is used in "createdir()" methods in
	"outOfProcess" and "fileUtils" modules. Both can be executed
	by multiple threads, so the protection is required.
	The functions is also used in tests, but tests are run
	(currently) in a single thread, so tests are not the issue.
	The old implementation was more efficient though - an
	average of 1000 calls to "get_umask()" in milliseconds:
	  Old implementation: 0.0013393709668889642
	  New implementation: 0.03459923900663853

2020-03-10  Benny Zlotnik  <bzlotnik@redhat.com>

	hsm: add a wait after connection to iscsi storage domain
	Add a 5 seconds (by default)[1] sleep after connecting to an iSCSI storage
	domain. This will help alleviate a race where it takes significantly
	longer for block devices to be visible for the host after it is logged into
	the device.

	TODO: A more comprehnsive solution is required here. Once
	ovirt-engine will send the list of devices along with the
	connectStorageServer command, we could only wait until the devices
	we want are visible.

	[1] The value can be changed by creating a configuration override file:
	    $ cat /etc/vdsm/vdsm.conf.d/iscsi_login_timeout.conf
	    [irs]
	    iscsi_login_timeout = 2

	Bug-Url: https://bugzilla.redhat.com/1807050

2020-03-10  Bell Levin  <blevin@redhat.com>

	net: Register pytest marks
	Pytest marks should from now on be registered, otherwise throw
	the following warning:
		/usr/local/lib/python3.6/site-packages/_pytest/mark/structures.py:327:
		PytestUnknownMarkWarning: Unknown pytest.mark.nmstate - is this a typo?
		You can register custom marks to avoid this warning - for details,
		see https://docs.pytest.org/en/latest/mark.html
		PytestUnknownMarkWarning,

2020-03-10  Ales Musil  <amusil@redhat.com>

	net: Move southbound validation to common class

	net: Rearrange functions in validator
	Since almost the whole file was refactored
	rearrange the functions, to keep the convention
	of having the public method first followed by
	the private.

	net: Refactor network validation

	net: Refactor Bond validation

	net: Refactor validate_nic_usage
	Consequence of this refactor is fixing the wrongly
	reported nic in use. Which was happening by moving
	network to different nic and using the old nic
	as bond slave, in the same transaction.

	net, tests: Refactor creation of netinfo
	Refactor creation of netinfo used in unit tests
	and move it to common place.

2020-03-09  Amit Bawer  <abawer@redhat.com>

	tests: Add patched resource manager to tmp_repo fixture
	SPM life cycle test needs a fully functioning resource manager,
	yet there are other tests currently using the FakeResourceManager,
	those are sdm and merge tests. The fake functionality could be
	considered to be replaced for those tests if found beneficial.

2020-03-09  Dan Kenigsberg  <danken@redhat.com>

	cmdutils_test: stop using deprecated execCmd

2020-03-08  Eyal Shenitzky  <eshenitz@redhat.com>

	vm: undefine domain using VIR_DOMAIN_UNDEFINE_CHECKPOINTS_METADATA

2020-03-08  Vojtech Juranek  <vjuranek@redhat.com>

	storage: create dedicated module for dmsetup
	Move dmsetup related code to dedicated module.

	Code coverage of new module is 86%, we currently don't cover flows when
	exception is thrown or when we we call supervdsm.

2020-03-05  Amit Bawer  <abawer@redhat.com>

	blocksd_test: Add test_spm_lifecycle
	Test SPM functionality, currently used for checking
	the transition from non-SPM pool to SPM pool and back.

2020-03-05  Nir Soffer  <nsoffer@redhat.com>

	lvm: Limit number of retries for read-only commands
	Limit the number of retries so the worst case when read-only command is
	repeated we don't create too much delay. Start with delay of 0.1
	seconds, to make sure we don't run the command too early.

	Most failing read-only commands are expected to succeed after one retry,
	delaying the command by 0.1 seconds. In the very unlikely worst case we
	will have 4 retries with delay of 1.5 seconds.

	Some commands like lvm.getVG(name), called when creating a new storage
	domain will always fail because the VG does not exist. When using
	read-only mode we retry such commands because we don't have a good way
	to detect failure because of inconsistent metadata, or missing VG.

	Before this change, this command was retried 10 times causing a delay of
	10 seconds. With this change the command will be retried 4 times,
	causing a delay of 1.5 seconds.

	Bug-Url: https://bugzilla.redhat.com/1553133

	tests: Limit the number of retries
	Update the reproducer script to limit the number of retries so the worst
	case when read-only command is repeated we don't create too much delay.
	Start with delay of 0.1 seconds, to make sure we don't run the command
	too early.

	Most failing read-only commands are expected to succeed after one retry,
	delaying the command by 0.1 seconds. In the very unlikely worst case we
	will have 5 retries with delay of 3.1 seconds.

	Here is example run with these settings:

	$ python extend.py run-regular \
	    --read-only \
	    --iterations 100 \
	    --extend-delay 10 \
	    -p 8000 \
	    host5 \
	    test \
	    /dev/mapper/36001405271fe76b24b542bf858aaeef7 \
	    /dev/mapper/360014051ce5179112ae4fb98e72d9ba9 \
	    2>run-regular.log

	$ python extend.py log-stats run-regular.log
	{
	    "activating": 5000,
	    "creating": 5000,
	    "deactivating": 4999,
	    "extend-rate": 4.684750527055517,
	    "extending": 99996,
	    "max-retry": 3,
	    "read-only": 109995,
	    "refreshing": 99996,
	    "removing": 4999,
	    "retries": 1374,
	    "retry 1": 1339,
	    "retry 2": 32,
	    "retry 3": 3,
	    "retry-rate": 0.012491476885312968,
	    "total-time": 21345,
	    "warnings": 3766
	}

	Bug-Url: https://bugzilla.redhat.com/1553133

2020-03-04  Amit Bawer  <abawer@redhat.com>

	blockSD: Add _iter_volumes helper
	Refactored into __getVolsTree() and occupied_metadata_slots()
	and would also serve dumping volumes in upcoming bulk API patch.

	blockSD: Add parse_lv_tags helper for parsing lv tags
	This was refactored for blockSD._getAllVolsTree() and will
	also serve volumes dump in upcoming patch for Bulk API.

	constants: Add volume info statuses constants
	This will also serve Bulk API volume statuses report.

	volumemetadata: Add dump() method for dumping the MD dict per volume
	This is done in a format intended to serve upcoming patches for
	dumping volumes metadata in StorageDomain.dump API.

	Bug-Url: https://bugzilla.redhat.com/1557147

	API: Add dumpStorageDomain() API
	First step in the Bulk API for a storage domain is to have
	a metadata section for the storage domain itself.

	This patch adds this functionality to be extended to
	other metadata sections in upcoming patches.

	Example:

	$ vdsm-client  StorageDomain dump sd_id=367a3231-8bb7-480a-9c0d-e69b0c6a9922

	{
	    "metadata": {
	        "alignment": 1048576,
	        "block_size": 512,
	        "class": "Data",
	        "metadataDevice": "3600140546373b0822a5423d865f46ce7",
	        "name": "big",
	        "pool": [
	            "edefe626-3ada-11ea-9877-525400b37767"
	        ],
	        "role": "Regular",
	        "state": "OK",
	        "type": "ISCSI",
	        "uuid": "367a3231-8bb7-480a-9c0d-e69b0c6a9922",
	        "version": "5",
	        "vgMetadataDevice": "3600140546373b0822a5423d865f46ce7",
	        "vguuid": "T0YCXK-6nkU-MRgg-7dNB-4zKd-UDpX-9kQOvt"
	    }
	}

	Bug-Url: https://bugzilla.redhat.com/1557147

2020-03-04  Nir Soffer  <nsoffer@redhat.com>

	tests: Reproducer for bug 1553133
	In lvm2 < 2.03, lvm commands such as lvs, vgs, pvs, and "lvchange
	--refresh" running on non-spm host may corrupt vg metadata.

	This is a stress test reproducing the issue on lvm2 < 2.03 (RHEL 7.7),
	and verifying that the issue was fixed on lvm2 >= 2.03 (RHEL 8.1).

	To reproduce the issue on lvm2 < 2.03, or verify that the issue was
	fixed with lvm2 >= 2.03, run the regular node without the --read-only
	option.

	To test the fix on lvm2 < 2.03 using locking_type=4, run the regular
	node with the --read-only option.

	The script provides 4 sub commands:
	- run-manager
	- run-regular
	- create-vg
	- remove-vg
	- log-stats

	The run-manager command starts now a simple socket server using JSON
	line based protocol.

	The run-regular command starts multiple workers based on --concurrency
	option. Every worker connect to the server and perform this flow:

	    for every iteration:
	        create LV
	        activate LV
	        repeat 20 times:
	            extend LV
	            refresh LV
	            sleep 0-2 seconds
	        deactivate LV
	        remove LV

	The read-write operations (create LV, extend LV, remove LV) are
	performed on the manager node. The read-only operations are performed
	locally.

	Read-only commands are performed using locking_type=1 with --read-only
	is not set, and locking_type=4 when --read-only is set. locking_type=4
	ensures that lvm does not attempt to recover the metadata when the
	metadata header was modified during the read. If a command find the
	storage in inconsistent state, it will fail instead of attempting to
	recover the metadata, which is likely to corrupt the VG metadata.

	To handle the expected failures of read-only commands, we use a retry
	mechanism with exponential back-off. When a command fails, we wait and
	retry the command again; each time the command fail we double the wait
	time to let the manager finish pending metadata updates.

	I tested creating 50,000 LVs, and then extending and refreshing them 20
	time, total of 1,000,000 successful operations at rate of 29.7
	operations per second.

	I also did some short runs creating 500 LVs, to check retry failure rate
	with lower operation rates.

	Here is a table with the results from all runs:

	extend-delay  extends  retries  max-retry  total-time  extend-rate  retry-rate
	------------------------------------------------------------------------------
	           1  1000000   237250          8       33677        29.69      21.56%
	           2    10000     1027          5         491        20.36       9.33%
	           5    10000      221          2        1138         8.78       2.00%
	          10    10000      100          2        2220         4.50       0.90%
	          20    10000       35          1        4382         2.28       0.31%
	          40    10000        8          1        8585         1.16       0.09%

	Related-To: https://bugzilla.redhat.com/1553133

2020-03-04  Dan Kenigsberg  <danken@redhat.com>

	v2v_test: stop using deprecated execCmd

2020-03-03  Bell Levin  <blevin@redhat.com>

	net, nmstate: Support custom bridge options
	The custom bridge options have been supported by the
	initscripts (ifcfg).

	This change introduces support for configuring bridge options
	through sysfs directly.
	The bridge options are persisted by VDSM (through unified persistence)
	and reported by the network caps.

	The new Bridge class does not support the creation of the bridge,
	because it is not needed for now. Once is necessary, an addition
	of "create()" is available.

	Bug-Url: https://bugzilla.redhat.com/1790503

2020-03-03  Ales Musil  <amusil@redhat.com>

	net, nmstate: Remove dynamic IP workarounds from NM conf file
	Remove config values that are not needed because
	it is already embedded in nmstate.

2020-03-02  Kaustav Majumder  <kmajumde@redhat.com>

	gluster: Fix for setting volume options in states {CREATED,STARTED,STOP}
	Replica 3 volume when set with 'granular-entry-heal' during 'optimize for virtstore',
	should do check for some conditions:

	1. When the volume is just created, use 'gluster volume set' to be used
	2. Once the volume is in stopped or started state, 'gluster volume heal' command to be used

	Bug-Url: https://bugzilla.redhat.com/1673277

	gluster: Refactoring some 'return True' statements

2020-03-02  Pavel Bar  <pbar@redhat.com>

	storage, task: fix small bug in "__eq__" method
	- Fixed "__eq__" method for "EnumType":
	correctly using "isinstance()" 2nd parameter.
	- Added tests for the above "__eq__" method.
	- Added a few TODOs for future code refactoring.
	Results of the following comments:
	  a. https://gerrit.ovirt.org/#/c/104377/24/lib/vdsm/storage/task.py@255
	  b. https://gerrit.ovirt.org/#/c/104377/24/lib/vdsm/storage/task.py@279
	  c. https://gerrit.ovirt.org/#/c/104377/14/lib/vdsm/storage/resourceManager.py@81

2020-03-02  Kaustav Majumder  <kmajumde@redhat.com>

	gluster: Fixed problems in using gluster events api
	Vdsm-gluster was not able to add or update hooks using the events api

	giving below error:
	File "<string>", line 2, in glusterWebhookAdd
	  File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, in _callmethod
	    raise convert_to_error(kind, result)
	UnboundLocalError: local variable 'out' referenced before assignment

	Another error:
	vdsm.rpc.Bridge.InvalidCall: Attempt to call function: <bound method GlusterHook.read
	of <vdsm.gluster.apiwrapper.GlusterHook object at ....
	error: a bytes-like object is required, not 'str'

	This patch fixes the above.

2020-02-28  Milan Zamazal  <mzamazal@redhat.com>

	hostdev: Add debug logging messages to mdev lookup

	hostdev: Add missing return to HostDevice.get_identifying_attrs

	virt: Remove libvirt hot unplug bug workaround

2020-02-27  Kaustav Majumder  <kmajumde@redhat.com>

	py3-gluster: Fixed other verbs using py3 updated _execGluster function
	With the removal of a deprecated execcmd function in gluster/cli.py
	https://gerrit.ovirt.org/#/c/104723/
	some verbs were still using old (rc,out,err) format, hence encountering
	errors.

	example error:
	File "<string>", line 2, in glusterVolumeSetHelpXml
	  File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, in _callmethod
	    raise convert_to_error(kind, result)
	ValueError: too many values to unpack (expected 3)

	This patch fixes the above.

2020-02-27  Pavel Bar  <pbar@redhat.com>

	tests: Adding coverage tests to outOfProcess.os.path.lexists
	Adding tests for not covered till now
	"outOfProcess.os.path.lexists".
	Improved coverage from 94% to 95%.

	tests: Adding coverage tests to outOfProcess.os.path.exists
	Adding additional flows to improve the test coverage
	for "outOfProcess.os.path.exists".

	tests: Adding coverage tests to outOfProcess.os.path.islink
	Adding additional flows to improve the test coverage
	for "outOfProcess.os.path.islink".
	Improved coverage from 93% to 94%.

	tests: Adding coverage tests to outOfProcess.os.path.isdir
	Adding tests for not covered till now
	"outOfProcess.os.path.isdir".
	Improved coverage from 92% to 93%.

2020-02-26  Ales Musil  <amusil@redhat.com>

	net, spec: Bump nmstate version requirement to 0.2.6

2020-02-26  Tomasz Baranski  <tbaransk@redhat.com>

	virt: TSC Frequency needs to be sent in Hz
	In order to start a VM, libvirt needs TSC Frequency exactly as it was
	reported in capabilities. VDSM now sends it as is, without any
	processing.

	Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=1779161

2020-02-26  Eyal Shenitzky  <eshenitz@redhat.com>

	backup: change backup config params to receive UUID
	Changed backup config parameters to validate UUID object instead of
	string for backup_id, from_checkpoint_id and to_checkpoint_id.

2020-02-25  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.5

2020-02-25  Amit Bawer  <abawer@redhat.com>

	API: Remove 'storagedomainID' from StorageDomain.ctorArgs
	Use storagedomainID per method instead.
	This is done to keep the legacy API usage intact and allowing using
	different argument name such as 'sd_id' for methods to be added.

	Bug-Url: https://bugzilla.redhat.com/1557147

2020-02-25  Eyal Shenitzky  <eshenitz@redhat.com>

	backup: add helper to a get socket path for a backup

	backup: freeze vm before backup begins
	We need to freeze the VM before starting a full/incremental
	backup operation.

	When the backupBegin operation ends, we need to thaw the VM.

	backup: add validation for empty disks
	We should not allow to start a backup without disks.

	backup: show backup XML in log message
	Show the new backup XML in log.info level and not in debug.

	backup_test: add tests for backup info

	backup_test: add tests for start/stop backup

2020-02-25  Milan Zamazal  <mzamazal@redhat.com>

	virt: Enable nowait_domain_stats by default
	The option wouldn't be much useful (and tested) if disabled.
	Let's enable it by default; it can be disabled if it causes trouble.

	virt: Move nowait stats configuration option to vars
	The option doesn't deserve its own section, let's move it to `vars'.
	Let's also clarify the purpose of the option.

2020-02-25  Ales Musil  <amusil@redhat.com>

	net, nmstate: Fix MTU reset after bond break
	Recent changes to nmstate showed that we have heavily
	relied on nmstate to reset MTU after bond break. Fix
	this flow by doing it explicitly in vdsm.

	net, nmstate: Fix typo in MTU function for bond slaves

2020-02-24  Steven Rosenberg  <srosenbe@redhat.com>

	vdsm: Added nowait flag to the domain stats
	Added the nowait flag to the domain stats for sending
	to libvirt in order to avoid blocking on domain stats
	communication with qemu. Also added a configuration
	flag to enable or disable the sending of the nowait
	flag within the domain stats message.

	The nowait flag within the libvirt avoids waiting for
	the proper conditions for supporting domain stats
	communication with the qemu when nested jobs are not
	allowed, jobs cannot be set, or other related
	conditions that would prevent domain stats communication.

	Bug-Url: https://bugzilla.redhat.com/1613514

2020-02-20  Dan Kenigsberg  <danken@redhat.com>

	mount_test: simplify
	We no longer care about the old location of mkfs, so the code can be
	simplified a bit.

2020-02-20  Marcin Sobczyk  <msobczyk@redhat.com>

	tests: Skip randomly failing tests
	'testClientNotify' and 'test_call' tests fails randomly in CI -
	let's skip them for now.

2020-02-19  Pavel Bar  <pbar@redhat.com>

	tests: Adding coverage tests to outOfProcess.fileUtils.createdir
	- Adding additional flows in order to reach 100% test
	coverage of the above method:
	    1) Testing recursive directories creation.
	    2) Folder already exists scenarios.
	       a) Mode matches.
	       b) Mode doesn't match.
	    3) Can't create inner directories due to wrong permissions.

	tests: Adding coverage tests to outOfProcess.os.stat
	Adding tests for not covered till now "outOfProcess.os.stat".

	tests: Adding coverage tests to outOfProcess.os.statvfs
	Adding tests for not covered till now "outOfProcess.os.statvfs".

	tests: Adding coverage tests to outOfProcess.os.unlink
	Adding tests for not covered till now "outOfProcess.os.unlink".

	tests: Adding coverage tests to outOfProcess.os.rmdir
	Adding tests for not covered till now "outOfProcess.os.rmdir".

	tests: Adding coverage tests to outOfProcess.os.rename
	Adding tests for not covered till now "outOfProcess.os.rename".

	tests: Adding coverage tests to outOfProcess.os.remove
	Adding tests for not covered till now "outOfProcess.os.remove".

	tests: Adding coverage tests to outOfProcess.os.mkdir
	Adding tests for not covered till now "outOfProcess.os.mkdir".

	tests: Adding coverage tests to outOfProcess.os.link
	Adding tests for not covered till now "outOfProcess.os.link".

	tests: Adding coverage tests to outOfProcess.os.chmod
	Adding tests for not covered till now "outOfProcess.os.chmod".

	tests: Adding coverage tests to outOfProcess.os.access
	Adding tests for not covered till now
	"outOfProcess.os.access".
	Test mostly copied "ioprocess".

	tests: Adding coverage tests to outOfProcess.utils.forceLink
	Adding tests for not covered till now
	"outOfProcess.utils.forceLink".
	The method coverage is 100%.

	tests: Adding coverage tests to outOfProcess.utils.rmFile
	Adding an additional flow to improve the test coverage
	for "outOfProcess.utils.rmFile".

2020-02-19  Amit Bawer  <abawer@redhat.com>

	tests: move chmod() contextmanager to storagetestlib
	This contextmanager will be used in upcoming test patches
	so it is better to have it on a common test library.

	Bug-Url: https://bugzilla.redhat.com/1557147

2020-02-19  Michal Skrivanek  <michal.skrivanek@redhat.com>

	spec: Update qemu-kvm requirement for RHEL AV 8.2
	Bug 1781637 - qemu crashed when do mem and disk snapshot

2020-02-18  Marcin Sobczyk  <msobczyk@redhat.com>

	New release: 4.40.4

2020-02-17  Eyal Shenitzky  <eshenitz@redhat.com>

	backup_test: add fixtures for mocking paths
	Add fixtures to mock:
	- backup.P_BACKUP - mock the path to the backup socket
	- transientdisk.P_TRANSIENT_DISKS - mock the path to scratch disks

2020-02-17  Bell Levin  <blevin@redhat.com>

	net: Remove support for sfq
	VDSM is currently targeted to run on kernel > 3.5.0, therefore
	sfq is rendered useless.

	The problem initially arised when using a kernel that was build
	from custom sources, adding a '+' sign at the end of the version
	(e.g., "3.5.10+"), complicating the parsing of the version due
	to usage of StrcitVersion().

	Bug-Url: https://bugzilla.redhat.com/1793867

2020-02-14  Nir Soffer  <nsoffer@redhat.com>

	vm: Log job UUID for tracked block job events
	When handling COMMIT and ACTIVE_COMMIT block job events locate the block
	jobs and log its jobID. This includes the block job event logs when
	grepping block job flow using the job UUID prefix:

	    grep d58202dd vdsm.log

	Without this you need to manually lookup for the libvirt/events related
	to the block job using the drive name.

	Related-To: https://bugzilla.redhat.com/1802277

2020-02-13  Nir Soffer  <nsoffer@redhat.com>

	vm: Store drive name in block job dict
	Currently we store only the drive pool/domain/image/volume UUIDs under
	the "disk" key. Storing the drive name (e.g "vda") makes it easy to
	locate the block job from block job callback event.

	I think this can also help later to simplify the code handling active
	layer commit[1]. Currently we first try to search for the drive using the
	stored "disk" UUIDs. This fails after active commit because the active
	layer volume changes, invalidating the job["disk"]["volumeID"], so we
	have to lookup again using the base volume ID. Using drive name we can
	always find the drive.

	[1] https://github.com/oVirt/vdsm/blob/8c8476c8ca225e067c8c042e60b132804a4e994d/lib/vdsm/virt/vm.py#L5135

	Related-To: https://bugzilla.redhat.com/1802277

	libvirtconnection: Ensure stable block job events
	Register and handle VIR_DOMAIN_EVENT_ID_BLOCK_JOB_2 instead of
	VIR_DOMAIN_EVENT_ID_BLOCK_JOB. The second version of the event reports a
	stable value for drive (e.g "vda") instead of the path of the active
	layer which may change after active layer commit or block copy.

	Using the drive name, we can locate the block job and include the block
	job UUID in the job complete event.

	In the future we can also mark the job as completed, avoiding races in
	handling of libvirt block jobs.

	With this change, we would get this log:

	2020-02-07 06:26:09,349-0500 INFO  (libvirt/events) [virt.vm]
	(vmId='a5a486b6-4ccb-40fb-9c47-ca1885cd87a9') Block job COMMIT for drive
	vda has completed (vm:5835)

	Instead of this (truncated) log:

	2020-02-07 06:26:09,349-0500 INFO  (libvirt/events) [virt.vm]
	(vmId='a5a486b6-4ccb-40fb-9c47-ca1885cd87a9') Block job COMMIT for drive
	/rhev/data-center/mnt/blockSD/sd-id/images/img-id/vol-id
	has completed (vm:5835)

	Related-To: https://bugzilla.redhat.com/1802277

2020-02-12  Dan Kenigsberg  <danken@redhat.com>

	mount_test: stop using deprecated execCmd

2020-02-12  Benny Zlotnik  <bzlotnik@redhat.com>

	tests: improve diskreplicate_test assertions
	Add assertions to ensure the drive monitor
	was or was not disabled when required

	tests: improve diskreplicate_test infrastructure
	Do not inherit from DriveMonitor. Since we only
	need to track the calls to enable/disable we can
	implement the two methods insteadof using the real
	implementation that can break our test if additional
	calls are made in it.

2020-02-12  Amit Bawer  <abawer@redhat.com>

	doc: Add documentation for profiling vdsm using yappi

2020-02-12  Eyal Shenitzky  <eshenitz@redhat.com>

	vmfakelib: add libvirt_error util
	Add libvirt_error(), used to create libvirt errors with libvirt error
	codes and a message.

	Will be used later in testing the backup module.

	backup_test: add infrastructure for fake objects
	Add needed fake objects in order to test start, stop, and backup info
	operations.

	backup_test: add fakedomainadapter
	Add a new fakedomainadapter to mock the DomainAdapter behavior.
	This infrastructure is needed for later use to test full backup.

	backup_test: add tests for full backup XML creation

2020-02-12  Amit Bawer  <abawer@redhat.com>

	lvm: Mark stalelv = False when done with reloadlvs operation for all lvs
	LVMCache._reloadlvs() operation runs on every execution of Vdsm's
	getVolumesList() API, invoking the slow "lvs" command for querying the
	VG's LVs instead of using the existing lvm cache.

	stalelv flag update was missing from the conclusion of
	a successfull reloadlvs() operation, unlike the updates done for
	the respective stale VGs and PVs flags after reloading them.

	For dump-volume-chains vdsm-tool API, first call getStorageDomainStats()
	to make sure that the LVM cache for the queried SD is refreshed
	before calling getLV() for the following getVolumesList() API calls.

	Profiled stats for "vdsm-tool dump-volume-chains <SD with 1000 disks>"

	- Before patch:

	   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
	   ...
	     1003    0.049    0.000 1051.099    1.048 API.py:1030(StorageDomain.getVolumes)
	     1003    0.017    0.000 1050.128    1.047 <decorator-gen-171>:1(HSM.getVolumesList)
	     1004    0.097    0.000 1049.140    1.045 blockSD.py:719(BlockStorageDomainManifest.getAllVolumes)
	     1004    4.667    0.005 1049.042    1.045 blockSD.py:696(BlockStorageDomainManifest.getAllVolumesImages)
	     1003    0.067    0.000 1047.946    1.045 hsm.py:3278(HSM.getVolumesList)
	     1003    0.008    0.000 1046.985    1.044 sd.py:818(BlockStorageDomain.getAllVolumes)
	     1004    3.611    0.004 1040.376    1.036 blockSD.py:235(getAllVolumes)
	     1052    0.059    0.000 1036.516    0.985 lvm.py:388(LVMCache.cmd)
	     1052    0.117    0.000 1036.103    0.985 lvm.py:271(LVMRunner.run)
	     1052    0.042    0.000 1034.866    0.984 lvm.py:298(LVMRunner._run_command)
	     1004    7.375    0.007 1031.827    1.028 blockSD.py:205(_getVolsTree)
	    11160    0.056    0.000 1028.596    0.092 lvm.py:1114(getLV)
	    11162    0.158    0.000 1028.546    0.092 lvm.py:761(LVMCache.getLv)
	     1011    9.584    0.009 1027.033    1.016 lvm.py:563(LVMCache._reloadlvs)
	   ...

	- After patch:

	    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
	     ...
	     1003    0.024    0.000   39.338    0.039 API.py:1030(StorageDomain.getVolumes)
	     1003    0.010    0.000   38.606    0.038 <decorator-gen-171>:1(HSM.getVolumesList)
	     1003    0.064    0.000   37.515    0.037 API.py:816(Volume.getInfo)
	     1003    0.054    0.000   36.971    0.037 hsm.py:3278(HSM.getVolumesList)
	     1003    0.011    0.000   36.437    0.036 <decorator-gen-157>:1(HSM.getVolumeInfo)
	     1004    0.083    0.000   36.350    0.036 blockSD.py:719(BlockStorageDomainManifest.getAllVolumes)
	     1004    6.195    0.006   36.268    0.036 blockSD.py:696(BlockStorageDomainManifest.getAllVolumesImages)
	     1003    0.006    0.000   35.987    0.036 sd.py:818(BlockStorageDomain.getAllVolumes)
	     1003    0.024    0.000   35.657    0.036 hsm.py:3082(HSM.getVolumeInfo)
	     1003    0.152    0.000   34.010    0.034 volume.py:233(BlockVolumeManifest.getInfo)
	     1004    2.934    0.003   25.792    0.026 blockSD.py:235(getAllVolumes)
	     5023    0.088    0.000   25.444    0.005 selectors.py:365(PollSelector.select)
	      821    0.022    0.000   24.123    0.029 subprocess.py:823(PrivilegedPopen.communicate)
	      821    0.156    0.000   24.097    0.029 subprocess.py:1486(PrivilegedPopen._communicate)
	     1069    0.136    0.000   22.658    0.021 commands.py:190(execCmd)
	       34    0.003    0.000   19.616    0.577 lvm.py:388(LVMCache.cmd)
	       34    0.003    0.000   19.324    0.568 lvm.py:271(LVMRunner.run)
	       34    0.002    0.000   19.316    0.568 lvm.py:298(LVMRunner._run_command)
	       34    0.003    0.000   18.769    0.552 commands.py:168(communicate)
	     1003    0.044    0.000   18.682    0.019 blockVolume.py:91(BlockVolumeManifest.getMetadata)
	     1004    9.062    0.009   18.513    0.018 blockSD.py:205(_getVolsTree)
	     1003    0.019    0.000   18.315    0.018 blockSD.py:901(BlockStorageDomainManifest.read_metadata_block)
	     1003    0.086    0.000   17.954    0.018 misc.py:82(readblock)
	    10/59    0.001    0.000   14.629    0.248 concurrent.py:254(run)
	        1    0.000    0.000   14.092   14.092 hsm.py:411(storageRefresh)
	     1031    0.048    0.000   13.472    0.013 clusterlock.py:459(SANLock.inquire)
	        1    0.001    0.001   13.385   13.385 lvm.py:807(bootstrap)
	     1003    0.033    0.000   13.187    0.013 volume.py:192(BlockVolumeManifest.getLeaseStatus)
	     1003    0.018    0.000   13.117    0.013 sd.py:540(BlockStorageDomainManifest.inquireVolumeLease)
	      823    0.062    0.000   11.075    0.013 subprocess.py:608(PrivilegedPopen.__init__)
	      823    0.202    0.000   10.913    0.013 subprocess.py:1228(PrivilegedPopen._execute_child)
	        2    0.000    0.000   10.680    5.340 lvm.py:986(_setLVAvailability)
	        2    0.001    0.000   10.680    5.340 lvm.py:952(changelv)
	  341/342    0.031    0.000    8.602    0.025 subprocess.py:608(Popen.__init__)
	  341/342    0.082    0.000    8.525    0.025 subprocess.py:1228(Popen._execute_child)
	377167/377171    0.860    0.000    7.843    0.000 betterAsyncore.py:166(Dispatcher._delegate_call)
	     8506    0.114    0.000    5.741    0.001 __init__.py:852(UserGroupEnforcingHandler.handle)
	  3042204    3.103    0.000    5.695    0.000 <string>:12(__new__)
	     8506    0.091    0.000    5.507    0.001 handlers.py:474(UserGroupEnforcingHandler.emit)
	    11051    0.239    0.000    5.496    0.000 lvm.py:764(LVMCache.getLv)
	    11049    0.049    0.000    5.399    0.000 lvm.py:1117(getLV)
	    ...

	Bug-Url: https://bugzilla.redhat.com/1557147

	lvm_test: Amend test_reload_lvs_with_stale_lv for cached LVM
	Next patch will cause getLV() to reload LVs only when LVM cache is
	invalidated, so at this point the test is expected to fail and
	marked with xfail.

	Bug-Url: https://bugzilla.redhat.com/1557147

2020-02-12  Marcin Sobczyk  <msobczyk@redhat.com>

	tests: Fix invalid command in 'test_run_should_log_result'
	'["exit 1"]' is not a valid command to run. The only reason it's
	working is that by default we run commands with
	'reset_cpu_affinity=True', which makes them being run with 'taskset'.
	This patch changes the used command to 'false', which gives the same
	result.

	py3: api: Pass bytes as commands' stdin
	'Host.fenceNode' API call runs a command for one of the fencing agents
	underneath. The arguments of this API call are passed to the command
	as standard input stream. To be successful on py3 we need to encode
	this stream to bytes, since unicode strings are not accepted.

	Bug-Url: https://bugzilla.redhat.com/1797477

2020-02-11  Nir Soffer  <nsoffer@redhat.com>

	protocoldetector: Handle None return value from sock.recv()
	Few years ago we had code checking for None, which was possible return
	value from asyncore when a socket is not ready. The code handling None
	was removed and we never had an issue, but maybe the recent python 3
	works introduce this issue again.

	We have a report showing that data is None:

	2019-12-17 16:36:58,393-0600 ERROR (Reactor thread) [vds.dispatcher] uncaptured python exception, closing channel
	<yajsonrpc.betterAsyncore.Dispatcher connected ('::ffff:172.20.1.142', 38002, 0, 0) at 0x7fbda865ed30> (
	  <class 'TypeError'>:object of type 'NoneType' has no len()
	  [/usr/lib64/python3.6/asyncore.py|readwrite|108]
	  [/usr/lib64/python3.6/asyncore.py|handle_read_event|423]
	  [/usr/lib/python3.6/site-packages/yajsonrpc/betterAsyncore.py|handle_read|71]
	  [/usr/lib/python3.6/site-packages/yajsonrpc/betterAsyncore.py|_delegate_call|168]
	  [/usr/lib/python3.6/site-packages/vdsm/protocoldetector.py|handle_read|115]
	)
	(betterAsyncore:179)

	Try to treat None in the same way we treat empty or short read.

	Bug-Url: https://bugzilla.redhat.com/1785061

2020-02-11  Marcin Sobczyk  <msobczyk@redhat.com>

	New release: 4.40.3

	ci: Fix used virtualenv version
	New virtualenv-20.0.1 version seemed to break our CI [1].
	Let's fix the version to the one used previously for now.

	[1] https://lists.ovirt.org/archives/list/infra@ovirt.org/thread/VJKVASATTJQ4ZWIISSR7BJIHP74KPESW/

2020-02-10  Tomasz Baranski  <tbaransk@redhat.com>

	virt: Engine needs host device block path
	In order to correctly build libvirt XML when a host device is used as a
	disk using a block driver, the engine needs to know the disk's block
	path used by host OS.

	This patch maps the generic path to the block path and includes it in
	the response to hostdevListByCaps.

	Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=1793550

2020-02-10  Michal Skrivanek  <michal.skrivanek@redhat.com>

	spec: Update qemu-kvm requirement for RHEL AV 8.2
	and a minor bump for CentOS 8.1 AdvancedVirtualization

2020-02-06  Andrej Krejcir  <akrejcir@redhat.com>

	hooks: Specify utf-8 encoding when reading result from a hook
	Previously it failed when utf-8 was not the default encoding
	for the active locale.

	Bug-Url: https://bugzilla.redhat.com/1795206

2020-02-06  Steven Rosenberg  <srosenbe@redhat.com>

	Core: Missing readinto function on StreamAdapter
	When performing a KVM Import, the importing within the
	ovirt-imageio module fails due to a missing function
	within the StreamAdapter of the kvm2ovirt module. This
	is a regression that breaks the kvm external import.

	Added the readinto function to the StreamAdapter in order
	to avoid the exception so that the Stream Adapter can
	successfully read the proper data.

	Bug-Url: https://bugzilla.redhat.com/1798175

2020-02-06  Ales Musil  <amusil@redhat.com>

	net, tests, nmstate: Remove xfail from keep ip test

	net, nmstate: Fix state_apply call
	state_apply should be called with verify_change
	as keyword argument.

	net, nmstate: Set connection timeout to infinity
	Even if the DHCP fails we need to keep the connection
	alive. Add infinite timeout for DHCP server response.

2020-02-05  Martin Perina  <mperina@redhat.com>

	Correct nmstate dependency
	Patch https://gerrit.ovirt.org/106465 incorrectly specifies dependency
	on nmstate

2020-02-05  Liran Rotenberg  <lrotenbe@redhat.com>

	virt: async snapshot recovery
	This patch will let the vdsm to recovery the snapshot job.
	If the VDSM is restarted while performing a snapshot,
	a new recovery job will start.
	The recovery will have the original job id and configuration.
	It will monitor libvirt by checking the snapshot jobstatus.
	After the snapshot operation finishes, the recovery will
	perform teardown.

	VDSM sends a command to libvirt in order to execute
	the snapshot. In case of recovery, the job might be
	running in libvirt. In domjobinfo we can see what is
	the job's status. Only one job can be executed at a specific
	time per VM. Therefore, using our VM metadata and checking
	libvirt domjobinfo status will give us good identification
	to the job status.

	In case it succeeds, we will have the new snapshot ready
	and the job will reported as done successfully to the engine.
	If not, it will be reported as failed.

	virt: remove the abuse of jobStats
	The domain is private property to the VM.
	Since now we have the function to get the job status
	without the usage of the domain, this will stop
	the abuse of it when getting the job status.

	virt: async snapshot refactor
	This patch is preparatory patch for snapshot recovery.
	No functional changes.

	Bug-Url: https://bugzilla.redhat.com/1749284

	virt: change live snapshot to async
	This patch will allow the operation of live snapshot
	to be async. The live memory dump was the only
	operation that was synchronized.

	Now, snapshotCreateXML will be executed on a
	separate thread. Making the live memory dump to be
	async operation.

	The job UUID given from the engine will be used to
	add as a job for the VM until it's done and let the engine
	monitor the job.

	The snapshot job will use Virt jobs mechanism in order to
	know if the memory dump is active while the engine will
	keep polling for the job.

	A follow-up patch will handle the recovery of the job.

	Bug-Url: https://bugzilla.redhat.com/1749284

	virt: preparatory patch for async snapshot
	This patch is refactor before moving the snapshot
	to jobs - for async snapshot.

	A follow-up patch will make the functional changes

	Bug-Url: https://bugzilla.redhat.com/1749284

2020-02-05  Ales Musil  <amusil@redhat.com>

	net, docker: Track NM 1.22 in func test image

	automation: Update NM dependency to 1.22
	VDSM is targeting CentOS 8.2 and should use
	NetworkManager 1.22 branch.

2020-02-03  Marcin Sobczyk  <msobczyk@redhat.com>

	vdsmd: Make sure 'vdsm' user has proper home dir
	VDSM requires proper home directory to work correctly. If it's missing
	or invalid (i.e. doesn't have correct permissions), most of the flow
	works fine, but some scenarios (the original bug mentions deploying
	metrics or creating templates) fail with errors that are not easily
	associated with the actual problem.

	This patch fixes the issue by refusing to run 'vdsmd' if home directory
	doesn't exist or has invalid permissions.

	Bug-Url: https://bugzilla.redhat.com/1692685

2020-02-01  Edward Haas  <edwardh@redhat.com>

	black: Set black defaults settings for VDSM formatting
	In order to avoid specifying the settings each time black is run,
	define the settings in pyproject.toml.
	The user can now just run `black <path>` to reformat the code.

2020-01-30  Nir Soffer  <nsoffer@redhat.com>

	vm: Log block job events
	When block job (e.g. blockCommit) completes or fails, libvirt sends an
	event. We used to log a debug level message showing the libvirt status
	and job type enum values:

	    unhandled libvirt event (event_name=8, args=("/path", 1, 8))

	This message is not very helpful when debug mode is enabled and is not
	available by default since debug mode is not enabled by default.

	Add VM.onBlockJobEvent(), logging a clear message when a job completes:

	    INFO ... Block job COMMIT for drive /path has completed

	And error message if job has failed:

	    ERROR ... Block job COMMIT for drive /path has failed

	We handle also other events such as "ready" and "canceled" and
	unexpected job status and type.

	We may want to use this event later for optimizing waiting for
	completion.

	Bug-Url: https://bugzilla.redhat.com/1796415

2020-01-30  Tomáš Golembiovský  <tgolembi@redhat.com>

	v2v: fix progress parsing
	Parsing of progress did not work properly when there are messages
	interleaved between progress lines. Newer virt-v2v is more verbose
	during data copying and we have to adapt.

2020-01-29  Vojtech Juranek  <vjuranek@redhat.com>

	vm: name live merge cleanup thread based on job ID
	Currently we name the thread which does live merge cleanup according
	to VM ID for this live merge runs. Thread name is used in the logs
	and makes it hard to follow the live merge in the logs when there run
	several live merge jobs on the VM. Use job ID for the thread name
	instead. The job ID is ID of of the job that does the live merge and
	for which cleanup thread is started.

2020-01-29  Pavel Bar  <pbar@redhat.com>

	storage, resourceManager: fix pylint warnings (LockState)
	- Removed the "LockState" inner class.
	- Refactored the remaining code that used to work
	with the enum class to produce a much simpler code
	and eliminate the pylint warnings:
	  1) No need in a "__hash__" method.
	  2) A bug in "__eq__" method (using "isinstance") was
	automatically resolved.

	storage, resourceManager: fix pylint warnings (RequestRef)
	- Added the missing methods in "RequestRef" inner class:
	  1) __ne__
	  2) __hash__
	  3) Implemented the missing tests for the newly created
	"__hash__" method.

2020-01-29  Milan Zamazal  <mzamazal@redhat.com>

	virt: Update guest agent API version before making migration params
	When a VM is migrated, guestAgentAPIVersion metadata may not be
	present on the destination.  Then the destination assumes the API
	version is 0 and after_migration event is not sent to the guest.

	The problem is that metadata is updated with guestAgentAPIVersion only
	after _srcDomXML is inserted into migration parameters.  _srcDomXML
	may not have guestAgentAPIVersion metadata in such a case and since
	the destination reads metadata from _srcDomXML it doesn't get the
	guest API version.

	This patch moves update_guest_agent_api_version() call before making
	_srcDomXML, ensuring the guestAgentAPIVersion is present in metadata
	read by the destination.

	Bug-Url: https://bugzilla.redhat.com/1788783

2020-01-28  Ales Musil  <amusil@redhat.com>

	net, nmstate: Fix update of default route network gateway
	Before this patch updating of gateway would result
	into error from nmstate, as nmstate assumed that we
	are in fact adding another gateway.

2020-01-28  Bell Levin  <blevin@redhat.com>

	net tests: Add qos editation test
	Test is needed to check basic nmstate functionality.

	The editation scenario is added to the existing add/remove qos
	test.

2020-01-28  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.2

2020-01-28  Miguel Duarte Barroso  <mdbarroso@redhat.com>

	net, nmstate: remove dynamic ip deactivation workaround
	nmstate-0.2 (which since [0] is the version tracked) has the fix
	for bug [1].

	As such, we can remove the workaround in vdsm.

	[0] - https://gerrit.ovirt.org/#/c/106328/
	[1] - https://bugzilla.redhat.com/show_bug.cgi?id=1782680

2020-01-28  Ales Musil  <amusil@redhat.com>

	net, nmstate, tests: Add test for moving dynamic address
	On IPv4, the client-id could be defined with the mac
	address of the interface as its only source. Since Nm 1.22
	the same parameter can be specified also for IPv6.
	Using the mac address as the only client ID source allows
	moving a dynamic address from a nic to a parent interface
	(bond, bridge, etc...).

	net, nmstate: Use NM default DHCP client

	spec, net: Add dhcp-client package as required
	dhcp-client package provides /etc/NetworkManager/dispatcher.d/11-dhclient.
	This script ensures backward compatibility with our
	dhcp-monitor even with different DHCP client from
	NetworkManager.

2020-01-28  Bell Levin  <blevin@redhat.com>

	net tests: Remove external bond old tests
	Removing these tests since we have them in the new tests already.

	The tests removed from this patch, test the following scenarios:
	1) setupNetworks of an external vlan + bond
	2) Persistence and restoration of missing vlan + bond (it is not external
	anymore since it was acquired by vdsm's setupNetworks)

	The following tests are testing the same scenarios:
	1) test_add_net_on_existing_external_vlanned_bond
	2) test_restore_missing_bond
	test_restore_missing_network_from_config

2020-01-26  Dan Kenigsberg  <danken@redhat.com>

	prlimit_test: drop usage of deprecated execCmd

2020-01-24  Nir Soffer  <nsoffer@redhat.com>

	spec: Require available version of nmstate
	Version 0.1.3 is not available on Fedora 30 which is our development
	version at this point:

	$ dnf info nmstate
	Installed Packages
	Name         : nmstate
	Version      : 0.0.8
	Release      : 1.fc30

	We can require newer version only if it is available via a repository
	added by ovirt-release-master rpm.

2020-01-24  Bell Levin  <blevin@redhat.com>

	net, func test: Update ipv4 edit test to include netmask, gw
	Increase test coverage by also editing gw and netmask
	in the existing IPv4 addr editation test.

2020-01-23  Milan Zamazal  <mzamazal@redhat.com>

	virt: Handle incoming migrations of paused VMs
	When a migration of a running VM is completed on the destination, we
	receive and handle VIR_DOMAIN_EVENT_RESUMED_MIGRATED event from
	libvirt.  However, when a paused VM is migrated, libvirt sends
	VIR_DOMAIN_EVENT_SUSPENDED_PAUSED instead.  Since we wait only for
	VIR_DOMAIN_EVENT_RESUMED_MIGRATED, migration completion of paused VMs
	is not recognized and the VM keeps being reported as migrating.

	This patch fixes the problem by handling
	VIR_DOMAIN_EVENT_SUSPENDED_PAUSED event in incoming migrations.

	Bug-Url: https://bugzilla.redhat.com/1660071

2020-01-22  Marcin Sobczyk  <msobczyk@redhat.com>

	systemd: Depend on libvirt's socket units
	Since libvirt switched to systemd's socket activation mechanism
	we should no longer depend on the service unit, but the socket units.

2020-01-21  Milan Zamazal  <mzamazal@redhat.com>

	virt: Update guest agent API version before migration
	In commit 88c6922f2, we removed obsolete migration parameters in
	migration setup.  However update_guest_agent_api_version() call not
	only returned the corresponding migration parameter, but also updated
	metadata with the current version.  This side effect should be
	retained, let's call this method again in migration setup.

2020-01-21  Marcin Sobczyk  <msobczyk@redhat.com>

	tests: Skip randomly failing tests
	These two tests fail randomly in CI - let's skip them for now.

2020-01-21  Milan Zamazal  <mzamazal@redhat.com>

	New release: 4.40.1

2020-01-20  Edward Haas  <edwardh@redhat.com>

	spec: Require nmstate-0.2.1 and up
	For Fedora 31, nmstate-0.1.3 is needed.

2020-01-20  Dan Kenigsberg  <danken@redhat.com>

	execcmd-blacklist: remove already-clean files

2020-01-20  Michal Skrivanek  <michal.skrivanek@redhat.com>

	automation: Add advanced virt repo
	Since this is a module override repo, it must be added to the repos in
	/etc/dnf/dnf.conf, changing just check-patch.repos.el8 wouldn't work.

	Let's also update the qemu-kvm requirement to the version from
	Advanced Virt.

2020-01-20  Edward Haas  <edwardh@redhat.com>

	net, func tests: Limit ifcfg direct manipulation
	Manipulating ifcfg directly fits the initscripts backend but not the
	nmstate backend. With nmstate, it causes the vlan profiles to remain on
	disk and later be autoconnected by other tests.

	Limit the direct manipulation of ifcfg in the net_with_bond_test module.

	net, func tests: Fix assert indentation
	https://gerrit.ovirt.org/#/c/89517 has wrongly indented an assertion in
	the net_with_bond_test module.

	net, func tests: Wait DHCPv4 IP be set after response
	Test runs show that even if the DHCPv4 response arrived, it still needs
	a period of time to add the configuration on the interface.
	Therefore, a 1 sec sleep is added when the response is detected.

	net, tests: Edit a bridge with a tap device is supported
	nmstate-0.1.3 and up is ignorning any unmanaged bridge ports, including
	tap devices.
	The test that checks if a network vlan id may be changed is now passing.

	automation: Update check-patch el8 repo with nmstate-0.2 copr
	VDSM is now targetting CentOS 8.2 and therefore should use nmstate-0.2
	stable branch.

2020-01-17  Pavel Bar  <pbar@redhat.com>

	constants: Removing the deprecated constants
	- Remove "MEGAB" and "GIB" deprecated constants
	after all their usages were replaced by other constants.
	This is the last patch, after all the original
	usages were replaced and merged.

	constants, vmfakelib: Use size constant for MiB
	- Use "MiB" constant instead of magic numbers.

	constants, prlimit_test: Use size constant for MiB
	- Use "MiB" constant instead of magic numbers.

	constants, cmdutils_test: Use size constants for MiB, GiB
	- Use "MiB" and "GiB" constants instead of magic numbers.

	constants, alignmentscan_test: Use size constant for GiB
	- Use "GiB" constant instead of magic numbers.

	constants, v2v: Use size constant for MiB
	- Use "MiB" constant instead of magic numbers.

	constants, kvm2ovirt: Use size constant for MiB
	- Use "MiB" constant instead of magic numbers.

	constants, vdsm_hooks: Use size constants for MiB, GiB
	- Use "MiB" and "GiB" constants instead of:
	    a) Magic numbers.
	    b) Other constants.

	constants, host, api: Use size constants for KiB, MiB
	- Use "KiB" and "MiB" constants instead of
	other constants.
	- Removing old constants from "common/define.py",
	since not used anymore.

	constants, mom: Use size constant for MiB
	- Use "MiB" constant instead of other constants.

	constants, commands: Use size constants for BLOCK_SIZE_4K
	- Use "BLOCK_SIZE_4K" constants instead of magic numbers.
	- Fixing a warning related to wrong function parameter's
	documentation.

2020-01-16  Amit Bawer  <abawer@redhat.com>

	doc: Use code quotes for iscsi-storage-setup.md
	Reference: https://help.github.com/en/github/writing-on-github/basic-writing-and-formatting-syntax#quoting-code

2020-01-16  Bell Levin  <blevin@redhat.com>

	net, old func tests: Remove slaveless bond
	In this case, instead of removing the nics by changing the ifcfg,
	The nics are removed by the "dummy_devices" contexmanager, and
	thus is suited for nmstate as well.

2020-01-16  Ales Musil  <amusil@redhat.com>

	net, nmstate: Refactor _set_vlans_base_mtu

	net, tests: Set nmstate disable in vdsm.conf for initscripts run
	Since nmstate was switched to be on by default we need
	to pass opposite config for non-nmstate functional tests.

2020-01-16  Edward Haas  <edwardh@redhat.com>

	spec: Update vdsm-network package spec
	Remove Fedora and RHEL conditions from the spec.

	iproute-tc is now needed on CentOS 8 and up.
	nmstate and initscripts are no longer dependent on specific Fedora
	version as older ones are no longer supported.

2020-01-16  Ales Musil  <amusil@redhat.com>

	net, nmstate: Add support for static IPv6 gateway

2020-01-16  Benny Zlotnik  <bzlotnik@redhat.com>

	storage,api: expose Volume.measure
	Expose Volume.measure to be used by engine to get
	to actual required size for a volume, rather than
	inaccurate guesses

	Bug-Url: https://bugzilla.redhat.com/1712832

	hsm: add _produce_volume helper

2020-01-16  Vojtech Juranek  <vjuranek@redhat.com>

	supervdsm: fix function name
	In the commit

	  commit 913cfd68ac2587207de1f82d86026e97f586afeb
	  Author: Vojtech Juranek <vjuranek@redhat.com>
	  Date:   Mon Jan 6 12:47:27 2020 +0100

	we introduced call of non-existing supervdms function during
	refactoring. This patch fixes this bug.

2020-01-16  Miguel Duarte Barroso  <mdbarroso@redhat.com>

	net, nmstate: update network backend
	Must add the nmstate mock to the netinfo_test, since 2 unit tests
	were failing.

	To make the integration tests work, we also need to disable nmstate
	for integration tests, since they rely on dummy interfaces, which
	when using the nmstate backend, are configured w/ NetworkManager,
	which is not available on that environment.

2020-01-15  Andrej Krejcir  <akrejcir@redhat.com>

	vm: Handle VIR_DOMAIN_EVENT_SHUTDOWN libvirt event
	The event is used to detect if the VM was shutdown from the guest OS,
	or from the host, for example by sending SIGTERM to qemu process.

	If the VM was shut down by signal, it is considered an error,
	and HA VMs will be restarted.

	Bug-Url: https://bugzilla.redhat.com/1783815

	vm: Extract methods for handling libvirt events
	Bug-Url: https://bugzilla.redhat.com/1783815

2020-01-15  Milan Zamazal  <mzamazal@redhat.com>

	virt: Rework host device handling
	Since only Engine XML is supported now, we needn't support the whole
	device handling machinery and can significantly simplify host device
	handling.  Together with it, changes needed for switching to XML
	device format in (not yet implemented in Engine) host device hot
	plug/unplug are made.

	This patch implements the following changes:

	- Original HostDevice classes are removed.  The bits still needed, for
	  detaching and reattaching host devices on the host, are moved to
	  separate functions.

	- Due to the actions performed on the host when hot plugging or
	  unplugging host devices, we still need some HostDevice class.  The
	  class is very simple, serving just as a proxy to the corresponding
	  functions.  Its instances are created only temporarily, just for the
	  given hot plug/unplug action.

	- Hot plug/unplug actions take and return device XML snippets, rather
	  than device descriptions and device names.  They have also been
	  renamed, to avoid any confusion and to be named consistently with
	  other hot plug/unplug methods.

	virt: Don't use _DEVICE_MAPPING for device dictionary initialization
	_DEVICE_MAPPING serves two purposes: To map devices from XML to device
	instances and to initialize Vm device map.  These are different
	purposes and as we move out from the ubiquitous device instances, we
	want to keep them separate.  So we introduce new
	_LEGACY_DEVICE_CLASSES list to initialize the legacy device mapping,
	until it is removed completely.

	virt: Generalized support for hot unplugged device teardown
	We receive a device removal event from libvirt once a device hot
	unplug is finished.  Since device hot unplug can take an arbitrary
	amount of time, this is an asynchronous event and we handle it in a
	callback.

	At the time this event arrives, the device is already gone and is no
	longer present in the domain XML.  We get only the alias of the
	removed device and nothing more.  In order to perform teardown actions
	on the device, we need to store the hot unplugged device somewhere.
	Device instances are currently used for the purpose, but we are trying
	to get rid of them, so we need another mechanism to track the hot
	unplugged devices.

	This patch adds a simplified device class that can be used (in future
	patches) to handle hot plugged devices without tracking them in
	Vm._devices and extends device removal callback with looking up the
	(simplified) removed device in a dictionary of devices being currently
	unplugged.

	The device being unplugged is lost on Vdsm restart.  This is not a
	fundamental regression, since the same could happen if the device was
	removed when Vdsm wasn't running and thus has no longer been present
	in the domain XML on new Vdsm restart.  Device teardown is not
	performed in both the cases.  Vdsm shouldn't be restarted during
	device hot unplug.

	virt: Move lookupDeviceXMLByAlias to a common helper
	It will be used in future patches.

	virt: Don't tear down graphics devices twice
	Graphics devices are already reported from Vm._tracked_devices() and
	needn't be retrieved (and torn down) in Vm._teardown_devices
	separately, once more.

2020-01-15  Bell Levin  <blevin@redhat.com>

	net: Fix '\w' py3 escape deprecation
	Runnning the unit test, we get the following warnings:
	/home/blevin/projects/vdsm/lib/vdsm/network/link/validator.py:38
	  /home/blevin/projects/vdsm/lib/vdsm/network/link/validator.py:38:
	DeprecationWarning: invalid escape sequence \w
	    bond for bond in bonds if not re.match('^bond\w+$', bond)

	/home/blevin/projects/vdsm/lib/vdsm/network/link/validator.py:44
	  /home/blevin/projects/vdsm/lib/vdsm/network/link/validator.py:44:
	DeprecationWarning: invalid escape sequence \w
	    and not re.match('^bond\w+$', net_attrs['bonding'])

	This warning will be a syntax error in the future and has to
	be changed.

	Strings in py3 are treated as unicode, and thus escaping the
	following - '\w'- results in escaping the character 'w'.
	To avoid this faulty escape we should convert the string
	to a raw string, which uses different rules for backslashes.

	net: Fix '\D' py3 escape deprecation
	Runnning the unit test, we get the following warning:
	/vdsm/lib/vdsm/network/models.py:442:
	DeprecationWarning: invalid escape sequence \D
	    nicsRexp = re.compile("^(\D*)(\d*)(.*)$")

	This warning will be a syntax error in the future and has to
	be changed.

	Strings in py3 are treated as unicode, and thus escaping the
	following - '\D'- results in escaping the character 'd'.
	To avoid this faulty escape we should convert the string
	to a raw string, which uses different rules for backslashes.

	net: Fix slash escape deprecation warning
	Runnning the unit test, we get the following warning:
	vdsm/tests/network/unit/ipwrapper_test.py:72:
	DeprecationWarning: invalid escape sequence \
	    'ff02::2 dev veth_23  metric 0 \    cache': (

	This warning will be a syntax error in the future and has to
	be changed.

	Strings in py3 are treated as unicode, and thus the following
	'\' should be escaped.

2020-01-14  Pavel Bar  <pbar@redhat.com>

	constants, storagefakelib: Some minor improvements
	- Removing redundant parentheses.
	- Adding an "_mb" suffix to the method parameter to
	reflect that it's in MiB units.
	- Fixing a wrong comment about the method parameter's
	type.
	- Removing a redundant cast to int and
	inlining a local variable.

	fallocate: minor fixes.
	- Removing a redundant import.
	- Minor comments' fixes.
	- Long lines reorganization.

2020-01-14  Ales Musil  <amusil@redhat.com>

	net, nmstate: MTU handling on net move or MTU reset
	Fix MTU handling for few scenarios:

	- Lowering the MTU value on any VLAN network
	- Moving network with custom MTU between different
	nic/bonds

	net, nmstate, tests: Sligthly refactor mtu functional tests
	- Move nmstate mark to the class
	- Replace hard coded mtu values with constants

	net, tests: Remove all Fedora 29 xfails

2020-01-14  Pavel Bar  <pbar@redhat.com>

	constants, virt: Use size constants for KiB, MiB
	- Use "KiB" and "MiB" constants instead of:
	    a) Magic numbers.
	    b) Other constants.
	- Removing redundant constant.

	constants, virt, vmdevices: Use size constant for MiB
	- Use "MiB" constant instead of other constants.

	constants, virt, tests: Use size constants for KiB, MiB, GiB
	- Use "KiB", "MiB" and "GiB" constants instead of:
	    a) Magic numbers.
	    b) Other constants.

2020-01-14  Edward Haas  <edwardh@redhat.com>

	net, func tests: Cover dynamic to static IP change with vlans
	The new tests cover scenarios of changing the IP method with and
	without a DHCP server.

2020-01-14  Vojtech Juranek  <vjuranek@redhat.com>

	devicemapper: refactor dmsetup status into dedicated function
	Rafactor calling of dmsetup status into separate function. This
	removes code repetition and serves as a preparation step to move
	it into dedicated dmsetup module.

2020-01-13  Miguel Duarte Barroso  <mdbarroso@redhat.com>

	net, nmstate: Fix network edit while waiting for dhcp response
	A workaround to bug [0] - which prevents the restoration of a
	network having active DHCP in it's southbound interface - is used.

	This workaround initializes the configuration on the southbound
	interface, and afterwards it becomes possible to switch that
	interface away from dhcp.

	[0] - https://bugzilla.redhat.com/1782680

	net, tests: add dynamic_ip_switch_to_static without DHCP server
	Add functional tests for switching from dynamic IP to static IP
	without a running dhcp server.

	Dhcp without a server is usually problematic when nmstate is
	used, which warrants these new tests.

	When the DHCP server is *not* started, there won't be any IP
	addresses, and as such, the netfunctestslib assertDHCPv[4|6]
	and assertRoutesIPv[4|6] have to be updated, adding the ability
	to skip IP address checking.

2020-01-13  Pavel Bar  <pbar@redhat.com>

	constants, gluster, thinstorage: Use size constants for KiB
	- Use "KiB" constant instead of magic numbers.

	constants, outofprocess_test: Use "BLOCK_SIZE_4K" size constant
	- Use "BLOCK_SIZE_4K" constant instead of 4096 magic numbers.

	constants, tests, volume: Use size constants for MiB, GiB, PiB
	- Use "MiB", "GiB" and "PiB" constants instead of:
	    a) Magic numbers (with power operator).
	    b) Other constants.
	    c) Other variations to receive the constants integer values.

	constants, tmpfs: Use size constant for GiB
	- Use "GiB" constant instead of other constants.

	constants, tests, sdm: Use various size constants
	- Use "KiB", "MiB" and "GiB" constants instead of:
	    a) Magic numbers.
	    b) Other constants.
	- Use "BLOCK_SIZE_512" constant instead of magic numbers.

	constants, sd_manifest_test: Use size constant for MiB
	- Use "MiB" constant instead of other constants.

	constants, nbd_test: Use size constants for KiB, MiB, GiB
	- Use "KiB", "MiB" and "GiB" constants instead of magic numbers.

	constants, mount_test: Use size constants for MiB, GiB
	- Use "MiB" and "GiB" constants instead of magic numbers.

	constants, loopback_test: Use size constant for GiB
	- Use "GiB" constant instead of magic numbers.

	constants, fakesanlock_test: Use size constants for KiB, MiB
	- Use "KiB" and "MiB" constants instead of magic numbers.

	constants: Convert VG extents size to bytes instead of MiB
	- This patch converts the VG extents size from MiB units to bytes.
	A result of the following comment:
	https://gerrit.ovirt.org/#/c/105153/1/lib/vdsm/storage/constants.py@40

	constants, blockSD: Partly reverting size constants changes
	- Undoing some KiB usages that appear to be
	redundant and not improving the readability.
	- Original patch:
	Ia57563acd2d5a6b2c6c1f01b347d382a4bad75e6

2020-01-13  Liran Rotenberg  <lrotenbe@redhat.com>

	virt: Add boot partition UUID to host capabilities
	This patch adds the boot partition UUID to host
	capabilities.
	It will be consumed by the engine to enable FIPS
	mode on the host with the right kernel boot parameters
	set.

	Bug-Url: https://bugzilla.redhat.com/1692709

2020-01-10  Vojtech Juranek  <vjuranek@redhat.com>

	tests: add more device mapper tests
	Add more tests for devicemapper module, which test multipath_status()
	and getPathsStatus() when there are no multipath devices (empty output)
	or no device mapper devices at all ()"No devices found" output).

	This patch increases test coverage from 82% to 84%.

	storage: determine correctly case when dmsetup doesn't find any devices
	In the commit

	  commit 57153711ef716b0e76d84733aa224102c79e8022
	  Author: Vojtech Juranek <vjuranek@redhat.com>
	  Date:   Fri Sep 13 14:24:48 2019 +0200

	we replaced deprecated misc.execCmd() with commands.run() in
	devicemapper module.

	misc.execCmd() returns output with splitted lines, while commands.run()
	returns raw output and output lines need to be splitted in the code
	calling this function. We did this in the patch, but forgot to change
	appropriately branch in getPathsStatus() function in which we determine
	the case when no device mapper device is found. In this case output
	contains only one line with no colon ("No devices found"). Using
	unsplitted output, we fail to detemine this case and fail with

	  error=not enough values to unpack (expected 2, got 1) (dispatcher:87)

	Use splitted output for determining the case when no device is found.

	Bug-Url: https://bugzilla.redhat.com/1766595

	tests: add getPathsStatus() test when no device is found
	Add test for function getPathsStatus() from devicemapper module when no
	device mapper device is found. If there's not device mapper device on
	the host, dmsetup returns message "No devices found" on stdout
	(which is a bug, it should send it to stderr - see
	https://bugzilla.redhat.com/1787541).
	Wee need to distinguish this case from regular dmsetup output,
	otherwise we fail with error parsing dmsetup output.
	This patch adds test for case.

	This test increases test coverage from 80% to 82%.

	Bug-Url: https://bugzilla.redhat.com/1766595

	tests: refactor fake dmsetup script
	Refactor fake dmsetup script so we can more easily change faked dmsetup
	output. To use different outputs is needed for test which will be added
	in subsequent patch. Use similar approach which is used in multipath
	test - include the script into test module and write the script content
	in each test using fake_executable.

2020-01-09  Pavel Bar  <pbar@redhat.com>

	constants, lvm: Consistent usage of "vdsm.storage.constants"
	Changing the usage of the "vdsm.storage.constants"
	constants to be consistent: always via aliased
	module name. Versus current situation that some of
	the constants are explicitly imported, while others
	are used via the aliased module name.

2020-01-08  Amit Bawer  <abawer@redhat.com>

	mpathhealth, udev: Switch to a polling multipath health monitoring
	This patch removes the pyudev dependency and the udev listener
	implementation, switching into a simple polling multipath health
	monitoring thread, using the dmsetup status command to periodically
	check the status within a 10 seconds intervals (configurable by
	'sd_health_check_delay' value).

	The change eliminates the need to maintain event handlers and track
	differences relating to an initial dmsetup call.

	Bug-Url: https://bugzilla.redhat.com/1651772
	Bug-Url: https://bugzilla.redhat.com/1768467

	mpathhealth, udev: Remove device name resolution from monitored paths
	Handled scenario:

	1. Getting path status from "dmsetup status"
	   in devicemapper.multipath_status()

	2. While processing results, device "8:11" is removed from the system.

	3. Trying to resolve the name of the device fails,
	   returning "8:11" silently.

	4. "8:11" is injected by mistake into multipath health monitor.

	For a four paths setup the wrong report would seem like this:

	    "valid_paths": 2
	    "failed_paths": ["8:11", "sda", "sdb"]

	Since we will never get an event for "8:11", we will always report it
	in the failed paths until vdsm is restarted.

	Dropping the device name resolution has no effect on Vdsm and Engine
	operation, as Engine only displays the failing hosts with their relevant
	DM_UUIDs and uses the number of failing device paths as an indicator,
	regardless of their actual names.

	Bug-Url: https://bugzilla.redhat.com/1651772

2020-01-08  Marcin Sobczyk  <msobczyk@redhat.com>

	supervdsm: Log errors encountered during shutdown
	During 'supervdsmd' we want to avoid any exceptions being raised
	to not make systemd restart the service. Silently accepting any
	exception however is anti-pattern that allows hiding serious bugs.
	This patch adds logging to syslog any problem that occurred during
	shutdown.

2020-01-07  Ales Musil  <amusil@redhat.com>

	net, nmstate: Replace deprecated nmstate schema usage

2020-01-06  Yedidyah Bar David  <didi@redhat.com>

	Do not fail adding a console ticket without a vnc username
	Without this patch, 'hosted-engine --add-console-password' fails with:

	Command VM.updateDevice ... failed:
	(code=100, message=General Exception: ('expected string or bytes-like
	object',))

	Bug-Url: https://bugzilla.redhat.com/1786451

2020-01-05  Edward Haas  <edwardh@redhat.com>

	net, nmstate: Edit a net vlan id removes the old vlan
	Editing a network vlan id needs to remove the old vlan interface.

	Fixed by considering a vlan ID change without its base interface
	changing as a network base iface change.

	net, tests: Extract nmstate vlan tests to nmstate subpackage

	net, tests: Extract nmstate helpers to nmstate.testlib

2020-01-02  Amit Bawer  <abawer@redhat.com>

	docker: Remove dockerfiles for centos 7 and fedora 29
	They are no longer part of the 4.40 target platforms.

2020-01-02  Bell Levin  <blevin@redhat.com>

	net, CI: Add nmstate func tests to check-patch
	Preforming network func tests on regular basis will
	improve the development of nmstate and help not
	introduce new bugs to existing nmstate tests.

	The nmstate test are still a little flaky on CI, and sometimes
	fail, thus returning as success whether the tests passed or not.

	the tests will be run only if some network files were
	changed, and do not take much extra time since it is based
	on a container.

2020-01-01  Pavel Bar  <pbar@redhat.com>

	constants: Introduce a new "HOSTS_MAX" constant with the value of 2000
	- Add a new "HOSTS_MAX" constant while also keeping the current
	"HOSTS_512_1M" & "HOSTS_4K_8M".
	The constant is used where the flow is relevant for both block sizes.
	Sanlock supports up to 2000 hosts and we don't have a use case for
	more hosts. This is the sanlock implementation that cannot change.

	constants, localfssd_test: Use size constants for MiB, GiB
	- Use "MiB" and "GiB" constants instead of:
	    a) Magic numbers (with power operator).
	    b) Other constants.

	constants, clusterlock_test: Use "BLOCK_SIZE_X" & "HOSTS_Y" size constants
	- Use "BLOCK_SIZE_X", "HOSTS_Y" constants instead of magic numbers.
	- Splitting 1 test function testing 2 different illegal parameters
	into 2 separate functions, each testing 1 illegal parameter at a time.

2019-12-31  Eyal Shenitzky  <eshenitz@redhat.com>

	caps: support backup operation if libvirt supports it

2019-12-29  Eyal Shenitzky  <eshenitz@redhat.com>

	backup: add scratch disks creation for start_backup
	When a backup starts, a scratch disk should be created for each disk
	that the backup includes.

	In order to create this scratch disks, we have two options -
	1) Let Libvirt create/remove the scratch disks for us.
	2) Create and remove the scratch disks in a custom way.

	The problem in the first option is that libvirt will create a scratch
	disk on the same path of the original volume.
	We cannot allow it since we don't want to create and maintain anything
	under the shared storage volume paths.

	So this patch implements the second option, when a backup starts,
	all the scratch disks will be added using the transientdisk API.

	When the backup stops, the scratch disks will be removed.

	Example for the backup XML that includes specific VM disks-
	<domainbackup mode='pull' id='0'>
	  <server transport='unix' socket='/run/vdsm/backup-id'/>
	  <disks>
	    <disk name='vda' type='file'>
	      <seclabel model="dac" relabel="no" type="none"/>
	      <scratch file='/path/to/scratch/disk.qcow2'/>
	    </disk>
	    <disk name='sda' type='file'>
	      <seclabel model="dac" relabel="no" type="none"/>
	      <scratch dev='path/to/scratch/disk.qcow2'/>
	    </disk>
	  </disks>
	</domainbackup>

	Scratch disks created and removed using HSM transient disk APIs.

	backup: implement stop_backup
	End a VM running a backup using the backup
	job id (currently always equal to 0).

	In case of a failure stopping the backup job
	BackupError will be raised.

	backup: implement backup_info
	backup_info will get the backup XML from libvirt by sending it
	the backup job id which is currently 0.

	The backup_info will return to the engine a map of the image IDs
	that participate in the backup and the NBD URL to those images backup.

	for example:
	'disks': {
	    "123":
	        "nbd:unix:/var/run/vdsm/backup/123:exportname=sda",
	    "456":
	        "nbd:unix:/var/run/vdsm/backup/456:exportname=sdb",
	}

	backup: implement start_backup to support full backup
	Starts a full backup for a specified VM.

	start_backup creates an XML from the given parameters in the following
	format:

	<domainbackup mode='pull'>
	    <server transport='unix' socket='/run/vdsm/backup-id.sock'/>
	</domainbackup>

	The XML doesn't specify any of the VM disks (will be added in a later
	patch), therefore, the backup will include all of the VM disks.

	Result XML from libvirt when running backupGetXMLDesc():

	<domainbackup mode='pull' id='0'>
	  <server transport='unix' socket='/run/vdsm/backup-id.sock'/>
	    <disks>
	      <disk name='vda' type='file'>
	        <driver type='qcow2'/>
	        <scratch file='/path/to/scratch/disk.qcow2'/>
	      </disk>
	      <disk name='sda' type='file'>
	        <driver type='qcow2'/>
	        <scratch file='/path/to/scratch/disk.qcow2'/>
	      </disk>
	...
	  </disks>
	</domainbackup>

	Starts a backup for the specified VM by calling
	libvirt backupBegin() job.

	The path to a socket is passed to QEMU and it will create the socket.
	Imageio will then connect to this socket during backup.
	All the sokets are stored under VDSM_RUN_DIR/backup.

	Example for VDSM log when starting a backup -
	INFO  (jsonrpc/2) [api.virt] START start_backup(config={'backup_id': '123',
	'disks': [...], 'from_checkpoint_id': None, 'to_checkpoint_id': None}
	) from=..., flow_id=789, vmId=345

	Example for successful backup operation return value-
	{
	  'disks':{
	    '111':'nbd:unix:/var/run/vdsm/backup_sockets/111.sock:exportname=sdb',
	    '222':'nbd:unix:/var/run/vdsm/backup_sockets/222.sock:exportname=sda'
	  }
	}

	storage: introduce transientdisk module
	The transientdisk module created in order to provide an API for adding
	a temporary disks on a temporary directory.

	The module will be responsible to create the transient disks directory under -
	/var/lib/vdsm/storage/transient_disks which creates when VDSM starts,
	when all the transient disks will be removed from the directory,
	the directory will be deleted.

	This module is needed for creating "scratch" disks for the full VM backup
	which is the first step in the incremental backup flow.

	The creation and removal of the transient disk added as public API for
	future usage, currently the API is used internally only.

2019-12-29  Amit Bawer  <abawer@redhat.com>

	contrib: Add target-tools for creation and deletion of iSCSI target LUNs
	Provided from git://git.engineering.redhat.com/users/nsoffer/vdsm-tools.git

	doc: Add iSCSI storage setup documentation

2019-12-29  Edward Haas  <edwardh@redhat.com>

	net, tests: Cleanup tap device NM profile
	When nmstate is used as the backend, the NM profile also needs to be
	removed.

2019-12-25  Nir Soffer  <nsoffer@redhat.com>

	nbd: Allow multiple clients
	We can increase upload and download performance using multiple
	connections, but current qemu-nbd command line allows only one client
	connection. Configure qemu-nbd to allow up to 8 clients.

	According to the qemu-nbd(8) consistency is not guaranteed between
	multiple writers, but Eric Blake says that it should be safe if clients
	write to distinct areas.

2019-12-24  Edward Haas  <edwardh@redhat.com>

	net, tests, func: Stop using nested bash with container shell
	When using the `--shell` argument, do not use a nested bash.

	net, nmstate: Fix detecting stale ifaces for sb-less nets
	VDSM supports the creation of networks with no southbound iface
	(nic/bond).
	Consider such cases when detecting stale interfaces.

2019-12-24  Bell Levin  <blevin@redhat.com>

	net tests: Add vlan change with tap test
	This scenerio was found to be failing with nmstate.
	Does not reproduce with just linux bridge, and does not
	reproduce without a tap device connected.

	Test fails while changing vlan id (editation) while a tap
	device is connected.

	net tests: move tap init functions to the test lib
	Creating tap devices will be needed in the future, thus moving
	these functions to be a common helper is needed for the func tests.

2019-12-22  Bell Levin  <blevin@redhat.com>

	net: xfail a skipped stable link test
	This test occasionaly gets a "state up -> down" exception.
	Later on we would want to add this test to be added to nmstate
	and should not be skipped.

2019-12-20  Miguel Duarte Barroso  <mdbarroso@redhat.com>

	net, nmstate: reset the link mtu on network dettach
	With this fix, another link_mtu test can be tagged to run when
	nmstate is selected as the networking backend.

	net, tests, nmstate: assure the base iface mtu is preserved
	When there are multiple networks on top of a single base interface
	and one of them is removed, the base iface MTU cannot be reset,
	since that base iface is still in use by other networks.

	This added test assures this intended behavior.

2019-12-18  Nir Soffer  <nsoffer@redhat.com>

	build: Remove --enable-4k-support configuration
	Since 4k is enabled by default, and can be switched of using vdsm
	configuration, there is not need to keep the configuration.

	spec: Enable 4k support by default
	The required qemu-kvm version is available now on all supported distros
	(RHEL 8.1, Fedora 30, 31) so we don't need to enable it during
	configure. The option is kept to allow disabling this feature if needed.

	To disable 4k support configure vdsm with:

	    ./autogen.sh --disable-4k-support

	Bug-Url: https://bugzilla.redhat.com/1748022

	spec: Update qemu-kvm requirement for RHEL AV 8.1
	This version includes the fixes need to enable 4k for gluster.

	To enable 4k support when building on RHEL 8.1:

	    ./autogen.sh --enable-4k-support

	Bug-Url: https://bugzilla.redhat.com/1748022
	Related-to: https://bugzilla.redhat.com/1749134

2019-12-18  Amit Bawer  <abawer@redhat.com>

	resourcemanager: Raise internal ResourceDoesNotExist for an unfound resource
	The legacy behavior was to hide the KeyError raised by
	ResourceManager.acquireResource() flow in Owner.acquire() error handling[1],
	not exposing the exception or a False return value to the API calls.

	This change adds the exception alert in case of attempting to acquire
	an unknown resource, causing the Owner.acquire() call to raise in case
	the raiseonfailure flag is set, and returning False for such case as well.

	[1] https://github.com/oVirt/vdsm/blob/8e53f0f22f3b52f39f4b47d273c279c9b27f1156/vdsm/storage/resourceManager.py#L687

	resourcemanager: Refactor fullName to snake case full_name

2019-12-18  Bell Levin  <blevin@redhat.com>

	net tests: Add test for vlan tag change

2019-12-18  Edward Haas  <edwardh@redhat.com>

	black: Report expected changes for black validation to pass
	When black formatting tool fails the validation, output the diff of the
	expected format changes.

	net, tests, func: Remove old source route test
	The removed test is already covered by tests in static_ip test module.
	(e.g. test_create_net_without_default_route)

2019-12-17  Pavel Bar  <pbar@redhat.com>

	constants, clusterlock_test: Use size constant for MiB
	- Use "MiB" constant instead of magic numbers.

	constants, blockdev_test: Use size constants for KiB, MiB & BLOCK_SIZE_4K
	- Use "KiB" and "MiB" constants instead of magic numbers
	(with power operator).
	- Use "BLOCK_SIZE_4K" constant instead of magic numbers.

	constants, backends_test: Use "BLOCK_SIZE_XYZ" size constants
	- Use "BLOCK_SIZE_512" & "BLOCK_SIZE_4K" constants instead of
	512 & 4096 magic numbers.

2019-12-17  Marcin Sobczyk  <msobczyk@redhat.com>

	tool: Add 'show-default-config' command
	Before 4.2, it was possible to print vdsm's configuration
	file by running 'vdsm/common/config.py' as a script:

	 python /usr/lib/python2.7/site-packages/vdsm/common/config.py

	This is not possible anymore due to clash between 'vdsm.common.glob'
	module and 'glob' module that comes with standard library.

	This patch adds a 'show-default-config' command to 'vdsm-tool' that
	prints out the default configuration for vdsm.

	Additionally, 'config.py' module can still be used to print out the
	default config with this syntax:

	 python3 -m vdsm.common.config

	Bug-Url: https://bugzilla.redhat.com/1529344

2019-12-17  Amit Bawer  <abawer@redhat.com>

	resourcemanager: Add exception for InvalidNamespace
	Update tests accordingly.

	resourcemanager: Replace deprecated log warn() calls with warning()

	resourcemanager_test: Remove usage of message parameter in pytest.raises()
	Since pytest 4.1, the usage of message parameter in pytest.raises() is
	deprecated and the only alternative is to use pytest.fail() for
	failing while printing the message.[1]

	In order to make test code more readable and less bulky,
	we simply remove this paramater from pytest.raises() calls.

	[1] https://docs.pytest.org/en/4.6-maintenance/deprecations.html#message-parameter-of-pytest-raises

	resoucemanager: Use InvalidLockType exception for an invalid resource lock
	Also add tests for this case.

2019-12-16  Edward Haas  <edwardh@redhat.com>

	net, tests, func: Move network with no persist to the new tests

	net, tests, func: Remove old style restore tests
	The removed tests are covered by the tests in the
	netrestore test module.

	net, tests, func: Add test that restores a missing bond

	net, tests, func: Add test that restores static ipv4 config
	The restoration occurs on an existing network (editation).

	net, tests, func: Move network stats test to the new format
	Simplify the existing stats tests and move it to the new functional
	tests format (pytest based).

	net, tests: Remove old functional upgrade test
	VDSM 4.4 does not support an upgrade path in-place.
	Therefore, old configuration is not expected when deploying VDSM.

	The old functional test that covered this scenario is therefore safe to
	remove.

	net, tests, func: Run restore dynamic ipv4 net with non-nmstate
	Restoring a dynamic IPv4 network is not working with nmstate but works
	well with initscripts.

	This change also renames the test to reflect that it restores a missing
	network.

2019-12-16  Amit Bawer  <abawer@redhat.com>

	resourcemanager: Use InvalidResourceName exception for invalid resource names
	Also pass the used invalid name as the exception value.

	resourcemanager_test: Use monkeypatch for setting FakeResourceManager
	This eliminates the last use of the MonkeyPatch decorator
	in resourceManager tests.

	resourcemanager_test: Apply tmp_manager fixture

	resourcemanager_test: Use pytest parameterize for testResourceLockSwitch
	Eliminate the need to call it from other tests and its default param.

	resourcemanager: Add internal exception for an already acquired resource
	Update tests accordingly.

	resourcemanager_test: Add tests for Owner.acquire() failing flows
	Added tests improves CI coverage for resourcemanager to 93%.

	resourcemanager_test: Add test for Owner.acquire() and releaseAll()
	Added test improves CI coverage for resourceManager from 84% to 90%.

2019-12-16  Marcin Sobczyk  <msobczyk@redhat.com>

	ci: Disable randomly failing tests
	'test_notify' and 'test_multiple_queues' test cases from
	'stomprpcclient_test' module fail randomly in the CI - let's
	skip them for now.

2019-12-16  Milan Zamazal  <mzamazal@redhat.com>

	virt: Limit payload path workarounds to clusters < 4.3
	In commit dbe4fd2cc67bccb40a2699717341b774f3b53bc9, we added
	workarounds to pass legacy payload paths when migrating in
	clusters < 4.4.  Now, when support for payload path changes on
	migrations has been backported to 4.3, the workarounds are needed only
	in clusters < 4.3.

2019-12-16  Pavel Bar  <pbar@redhat.com>

	constants, xlease_test: Use size constant for GiB
	- Use "GiB" constant instead of other constants.

	constants, qemuimg: Use size constants for KiB, MiB, GiB
	- Use "KiB", "MiB" and "GiB" constants instead of:
	    a) Magic numbers (with power operator).
	    b) Other constants.
	    c) Other variations to receive the constants integer values.

	constants, misc: Use size constants for KiB, MiB, GiB, TiB
	- Use "KiB", "MiB", "GiB" and "TiB" constants instead of:
	    a) Magic numbers and usage of shift left operator.
	    b) Other constants.

	constants, merge: Use size constants for KiB, MiB, GiB
	- Use "KiB", "MiB" and "GiB" constants instead of:
	    a) Magic numbers.
	    b) Other constants.

	constants, mailbox: Use size constants for KiB, MiB, GiB
	- Use "KiB", "MiB" and "GiB" constants instead of:
	    a) Magic numbers.
	    b) Other constants.

	constants, imageSharing: Use size constants for KiB, MiB
	- Use "KiB" and "MiB" constants instead of:
	    a) Magic numbers.
	    b) Other constants.

	constants, imagetickets_test: Use size constant for GiB
	- Use "GiB" constant instead of magic number (with power operator).

	constants, image: Use size constants for MiB, GiB
	- Use "MiB" and "GiB" constants instead of other constants.

	constants, fileVolume: Use size constants for MiB, GiB
	- Use "MiB" and "GiB" constants instead of:
	    a) Magic numbers (with power operator).
	    b) Other constants.

	constants, fakelib: Use size constants for MiB, GiB
	- Use "MiB" and "GiB" constants instead of:
	    a) Magic numbers.
	    b) Other constants.
	    c) Other variations to receive the constants integer values.

	constants, testlib: Use size constants for KiB, MiB
	- Use "KiB" and "MiB" constants instead of:
	    a) Magic numbers.
	    b) Other constants.

	constants, formatconverter: Use size constants for MiB, GiB & XLEASES_SLOTS
	- Use "MiB", "GiB" & "XLEASES_SLOTS" constants instead of:
	    a) Magic numbers (with power operator).
	    b) Other constants.

	constants, storage, lvm: Use size constants for MiB, GiB
	- Use "MiB" and "GiB" constants instead of:
	    a) Magic numbers (with power operator).
	    b) Other constants.
	- Reusing "utils.round()" instead of code that does the same.

2019-12-15  Bell Levin  <blevin@redhat.com>

	net tests: Add change mtu test on same net and nic

	CI: Remove el7 check-networks substages
	Since el8 came out, el7 became deprecated and should not
	be tested against.

	After adding the linux bridge and ovs tests to run on el8,
	[1], [2], we have no use for el7 anymore.

	[1] https://gerrit.ovirt.org/#/c/105061/
	[2] https://gerrit.ovirt.org/#/c/105470/

	CI, net: Update OVS func tests to run on el8
	Since el8 came out, el7 became deprecated and should not
	be tested against.

	For performance reasons the tests were moved from lago to
	a container (cutting the job duration by 2).

2019-12-13  Pavel Bar  <pbar@redhat.com>

	directio: comments & TODOs fixes
	- New TODOs to be fixed later. Result of the following comments:
	  https://gerrit.ovirt.org/#/c/105155/1/lib/vdsm/storage/directio.py
	- Fixed typos.
	- Removed redundant parentheses.

	multipath_test: Fixing typo in "fake_executable" fixture

	hsm: renaming variables that hold values in MiB
	- Adding an "_mb" suffix to variables containing values
	in MiB units.

	hsm: comments & TODOs fixes
	"extendVolume()" method updates.
	- Updated "size" parameter's description.
	- New TODO to be fixed later. Result of the following comment:
	  https://gerrit.ovirt.org/#/c/104367/9/lib/vdsm/storage/hsm.py@707

2019-12-12  Miguel Duarte Barroso  <mdbarroso@redhat.com>

	net, tests, nmstate: xfail the IP netrestore test
	This test currently fails because NM does not support disabling the
	IP stack while a DHCP session is being established.

	The issue is tracked in bug [0].

	[0] - https://bugzilla.redhat.com/1782680

2019-12-12  Bell Levin  <blevin@redhat.com>

	CI: Add nmstate tests to net func tests

2019-12-12  Marcin Sobczyk  <msobczyk@redhat.com>

	tests: Skip randomly failing test
	'testMethodReturnsNullAndServerReturnsTrue' test fails randomly in CI -
	let's skip it for now.

2019-12-12  Amit Bawer  <abawer@redhat.com>

	automation: Use alternative fc30-updates-debuginfo repo
	listed repo is not always reachable and fails CI.

2019-12-12  Milan Zamazal  <mzamazal@redhat.com>

	doc: Document the release process

2019-12-12  Marcin Sobczyk  <msobczyk@redhat.com>

	tests: Disable two tests that fail randomly in CI
	Two tests from 'stomprpcclient_test' module fail randomly in the CI.
	This issue is discussed in the mailing lists [1]. For now let's disable
	them.

	[1] https://lists.ovirt.org/archives/list/devel@ovirt.org/thread/OXD43H6JPFCLZMO6UKIMJHS23TADMBKC/

2019-12-11  Ales Musil  <amusil@redhat.com>

	net, nmstate: Skip delay from dhcp monitoring notification
	Delay that was waiting for ifup to complete is not
	needed anymore with nmstate backend.

2019-12-11  Bell Levin  <blevin@redhat.com>

	CI: Move net-func-tests legacy to an el8 container

	net tests: Add choice to run with docker
	Podman cannot be ran in the CI thus a change to docker is needed.

2019-12-11  Milan Zamazal  <mzamazal@redhat.com>

	spec: Remove an extra blank line before %changelog

	spec: Fix typo in vdsm_release setting example

	configure: Update Vdsm mail address

2019-12-11  Miguel Duarte Barroso  <mdbarroso@redhat.com>

	automation: track nmstate-0.1
	Track nmstate-0.1 instead of nmstate master.

2019-12-11  Marcin Sobczyk  <msobczyk@redhat.com>

	ci: Fix python3-ioprocess fc30 package name
	'python3-ioprocess-1.4.0' has been released on fc30, so we're dropping
	the specific version we were pointing to as a dependency.

2019-12-11  Miguel Duarte Barroso  <mdbarroso@redhat.com>

	net, tests, nmstate: tag netrestore tests for nmstate
	The test_restore_missing_network_from_config test works out of the
	box.

	On follow-up patches the test_restore_dynamic_ipv4_network test
	will also be marked as possible to run with the nmstate backend.

2019-12-11  Ales Musil  <amusil@redhat.com>

	net, nmstate: Flush QoS if needed
	 On network editation, if QoS is not specified, it
	 implies that the QoS config needs to be removed.

2019-12-11  Edward Haas  <edwardh@redhat.com>

	docker, Dockerfile.func-network-centos-8: Drop nmstate copr
	The nmstate package is now provided through ovirt-release and therefore
	there is no need to explicitly add a nmstate copr.

2019-12-10  Eyal Shenitzky  <eshenitz@redhat.com>

	backup: change start API to use single config object
	Change current incremental backup API to use a single config dict
	object that contains all the needed data for the backup instead of
	multiple separated parameters.

	Using a single wrapping object will give us the agility to add/remove
	parameters without breaking the API and will reduce code.

	This patch doesn't break any existing API since incremental backup
	feature is not supported yet and shouldn't be used by anyone.

2019-12-09  Ales Musil  <amusil@redhat.com>

	net, nmstate, tests: Use nameservers reported from nmstate

	net, tests: Add specific dns for container
	Docker and podman copies the hosts resolv.conf
	inside container. The setup networks will fail
	with ovs if there is more than 3 nameservers.
	And for nmstate if there is more than 2 nameservers.

2019-12-09  Marcin Sobczyk  <msobczyk@redhat.com>

	supervdsm: Cleaner shutdown procedure
	Supervdsmd has an issue that occurs randomly during shutdown:

	 systemd[1]: Stopping Auxiliary vdsm service for running helper functions as root...
	 daemonAdapter[12446]: Traceback (most recent call last):
	 daemonAdapter[12446]:   File "/usr/lib64/python3.6/multiprocessing/util.py", line 262, in _run_finalizers
	 daemonAdapter[12446]:     finalizer()
	 daemonAdapter[12446]:   File "/usr/lib64/python3.6/multiprocessing/util.py", line 186, in __call__
	 daemonAdapter[12446]:     res = self._callback(*self._args, **self._kwargs)
	 daemonAdapter[12446]: FileNotFoundError: [Errno 2] No such file or directory: '/var/run/vdsm/svdsm.sock'

	This exception causes systemd to actually restart the service instead
	of shutting it down.

	In cleanup stage of supervdsm's shutdown procedure we delete the
	socket file if it exists, but multiprocessing facilities are doing that
	by themselves when shut down properly. The reason we experience this
	error randomly is most probably the ugly way we treat supervdsm's
	serving daemonic thread.

	This patch adds a more polite way of shutting down - we make
	a connection to the server, call the 'server.shutdown' method
	and join on the serving thread.

	After implementing this change, the 'FileNotFoundError' exception occurs
	100% of the time, meaning we give multiprocessing facilities a way
	to shut down properly. We therefore don't need to delete the socket file
	by ourselves anymore.

	Bug-Url: https://bugzilla.redhat.com/1778638

2019-12-04  Pavel Bar  <pbar@redhat.com>

	constants, storage, constants: Use size constants for MiB
	- Use "MiB" constants instead of:
	    a) Magic numbers.
	    b) Other constants.
	    c) Removing incorrect comment and fixing 2 comment typos.

	constants, fileSD: Use size constant for MiB
	- Use "MiB" constant instead of other (deprecated) constants.

	constants, directio: Use size constant for KiB
	- Use "KiB" constant instead of magic numbers.

2019-12-04  Milan Zamazal  <mzamazal@redhat.com>

	libvirt: Don't link migration certificates to Vdsm certificates
	Vdsm certificates are used for libvirt connections.  Migration should
	use a separate set of certificates, signed by a separate authority, in
	order to prevent QEMU processes from accessing libvirt daemons.

	The migration certificates must be created and provided by Engine.

	Bug-Url: https://bugzilla.redhat.com/1739557

2019-12-04  Eyal Shenitzky  <eshenitz@redhat.com>

	backup: replace backupEnd with abortJob
	Libvirt now uses abortJob to end a backup job instead of using a uniqe
	API call.

2019-12-04  Tomasz Baranski  <tbaransk@redhat.com>

	virt: Add CPU features to host capabilities
	Adding CPU features read from libvirt's domcapabilites to the cpuFlags.

	CPU features are exposed by Intel CPUs by MSR register. For Cascadelake
	series of CPUs (and some earlier model but we only care about
	Cascadelake+) Intel reports fixes to vulnerabilities with the MSR
	register (using arch_capabilities), not regular CPU flags. In order to
	correctly choose a CPU type in the engine, it needs to have access to
	both flags and features. Future CPU configurations will use a
	combination of both.

	The change is fully backwards-compatible. 4.4 engine will continue to
	work with pre-4.4 hosts. 4.4 hosts will not break pre-4.4 engines
	either. Extra items in 'cpuFlags' field are ignored by the selection
	code. Also, arch_capabilities was not backported to el7 qemu so it's
	only available since 4.4.

	Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=1693634

2019-12-03  Kaustav Majumder  <kmajumde@redhat.com>

	py3: gluster: Remove usage of deprecated 'execCmd' in gluster/api.py

	py3-gluster: Fixed deprecated APIs in blivet

	py3: gluster: Passing list instead of string to maintain consistency
	Follow up on comment:
	 https://gerrit.ovirt.org/#/c/104723/15/lib/vdsm/gluster/cli.py@113

2019-12-03  Pavel Bar  <pbar@redhat.com>

	constants, blockVolume: Use size constants for MiB, GiB
	- Use "MiB" and "GiB" constants instead of:
	    a) Magic numbers.
	    b) Other constants.
	    c) Other variations to receive the constants integer values.

	constants, blockSD: Use size constants for MiB, GiB & LEASES_SLOTS
	- Use "MiB", "GiB", "LEASES_SLOTS" & "XLEASES_SLOTS" constants instead of:
	    a) Magic numbers.
	    b) Other constants.
	    c) Other variations to receive the constants integer values.

	constants, sd: Use size constants for MiB, GiB
	- Use "MiB" and "GiB" constants instead of magic numbers
	calculated using shift left operator.

	constants, asyncevent: Use size constant for KiB
	- Use "KiB" constant (and its multiples) instead of magic numbers.

2019-12-03  Ales Musil  <amusil@redhat.com>

	net, nmstate: Add QoS support

	net, nmstate: Add _get_base_interface helper

2019-12-02  Kaustav Majumder  <kmajumde@redhat.com>

	py3: gluster: Remove usage of deprecated 'execCmd' in events.py

2019-12-02  Miguel Duarte Barroso  <mdbarroso@redhat.com>

	net, tests, nmstate: tag rollback tests for nmstate
	With patch [0], all rollback functional tests can be run against
	the nmstate backend.

	[0] - https://gerrit.ovirt.org/#/c/103643/

	net, tests, nmstate: fixes the rollback functests
	The rollback tests try to rollback the network state when ever
	the connectivity check fails.

	When the setupNetworks command fails - which it does in this
	scenario since the connectivity check is not successful - the
	failed transaction is replayed, thus removing all the elements that
	might have leaked in the failed transaction.

	Since the configuration on the setup_nmstate function never
	updated the configuration, the 'delete' transaction fails since
	most of the objects being deleted do not exist.

	Saving the configuration before attempting the connectivity check
	fixes this.

	This happens because vdsm's nmstate wrapper uses the RunningConfig
	object instead of using the KernelConfig object.

2019-12-02  Kaustav Majumder  <kmajumde@redhat.com>

	gluster-tests: Added tests for gluster/cli module

	py3: gluster: Remove usage of deprecated 'execCmd' in thinstorage.py
	'commands.execCmd' function has been deprecated in favor
	of 'commands.{run,start}' functions. This patch replaces its usage
	in 'gluster' module.

	py3: gluster: Remove usage of deprecated 'execCmd' in cli.py
	Fixed hostUUID and peerStatus

2019-12-02  Milan Zamazal  <mzamazal@redhat.com>

	spec: Switch default target_py to py3
	We should no longer build with Python 2 by default.

2019-12-02  Bell Levin  <blevin@redhat.com>

	net: Add exception handling for double mirroring
	Having a vm with multiple interfaces, each has a vnic profile with
	port mirroring enabled on it - the port mirroring will be
	enabled multiple times on the bridge.

	On a new version of the kernel, a different exception is thrown
	and thus is not handled in our code.

	Add handling of the new exception.

	Bug-Url: https://bugzilla.redhat.com/1765018

2019-12-02  Kaustav Majumder  <kmajumde@redhat.com>

	py3: gluster: Remove usage of deprecated 'execCmd' in gfapi.py

2019-12-02  Milan Zamazal  <mzamazal@redhat.com>

	spec: Build hooks and vhostmd rpm's by default
	The extra packages may or may not be useful upstream, but they are
	harmless in any case.  Let's reduce the need for downstream hacks by
	unifying the set of packages build upstream and downstream.

	Backport-To: 4.3

	spec: Omit sequence numbers from timestamped release numbers
	Commit sequence numbers make little sense when used with timestamped
	release numbers.  The purpose of timestamped release numbers is to
	make Vdsm updates on hosts easier during development, by always
	producing newer versions that we can easily upgrade to.  But when
	prefixed with a commit sequence number, switching between development
	branches or different commits doesn't work this way, or we can easily
	be behind upstream repos available on the system.

	Let's drop commit sequence numbers from the timestamped release
	numbers, we won't miss anything.

2019-12-02  Miguel Duarte Barroso  <mdbarroso@redhat.com>

	network: update known exceptions
	Comply with the exception string update introduced in [0].

	[0] - https://github.com/NetworkManager/NetworkManager/ \
	  commit/d35d3c468a304c3e0e78b4b068d105b1d753876c

2019-12-01  Pavel Bar  <pbar@redhat.com>

	storage, hsm: fix pylint warnings (division)
	Fixing a warning produced by "pylint --py3k" command.
	Warning: "division w/o __future__ statement (old-division)".

	Solution:
	1) Added a "division" import from "__future__".
	2) Updated the division to be integer division in both
	Python 2 and Python 3.
	3) Minor exception message update.

2019-11-30  Amit Bawer  <abawer@redhat.com>

	storage, resourcemanager: Remove Owner.ownedResources() unused API

	storage, resourcemanager: Remove Owner.requests dict
	Since we do not manage resource requests in resource Owner, we
	can remove the requests dict and its related getters and their
	usage in Task state transition condition (which is always true):

	- Owner.requestedResources()
	- Owner.requestsGranted()

	Removal of unused code improves CI coverage for resourceManager
	from 83% to 84%.

	storage, resourcemanager: Turn Owner.release() method private

	storage, resourcemanager: Remove Owner.cancelAll() API and its usage
	This API is used for cancelling all resource requests managed by
	the resource Owner, but since we do not manage the requests there,
	we can remove the ineffective API and its calls.

	storage, resourcemanager: Remove unused Owner.wait() API
	This is used to wait upon a specific managed resource request
	but we do not manage requests in the Owner anyway.

	Removal of unused code improves CI coverage for resourceManager
	from 81% to 83%.

	storage, resourcemanager: Remove unused Owner.cancel() API
	This is used to cancel a specific managed resource request
	but we do not manage requests in the Owner anyway.

	Removal of unused code improves CI coverage for resourceManager
	from 79% to 81%.

	storage, resourcemanager: Remove unused Owner.register() API
	Also remove its exclusively used methods and exception:

	- Owner._registerResource()
	- Owner._onRequestFinished()
	- Owner._granted()
	- Owner._canceled()
	- Task.resourceRegistered()
	- ResourceDoesNotExist

	Removal of unused code improves CI coverage for resourceManager
	from 71% to 79%.

2019-11-30  Vojtech Juranek  <vjuranek@redhat.com>

	lvm: remove replaceLVTag()
	lvm.replaceLVTag() does the same thing as changeLVsTags(), but only for
	one delete and add tag. Moreover it's not tested. Replace it with
	changeLVsTags().

	This change doesn't change test coverage.

	lvm: remove unused addtag()
	lvm.addtag() is not used except couple of places in the test harness.
	Remove this function and replace it with changeLVsTags() in tests.

	This increases test coverage from 75% to 76%.

	storage: always call lvm.removeLVs() with tuple of LVs
	lmv.removeLVs() accepts iterable of LVs to be removed. It can be called
	also with single string value representing single LV as it calls
	lvm.normalize_args() before using it. However, this is just a workaround
	how to make it iterable and one should always pass only a tuple of LVs
	into this function so that normalize_args() can be removed in the
	future.

	Replace normalize_args() function with a check that provises lvs is
	iterable (either list or tuple).

	storage: activate/deactivate LVs before/after zeroing
	When preparing LVs for removal, we amend their tags and then activate
	them. This is not needed if remove them immediately. Remove activation
	of LVs from this function and call it when it really needed, before we
	zero the LVs. Also deactivate LVs after that, so that we are not
	removing active LVs.

	Activate LVs one by one instead of bulk activation. If activation of one
	of the LVs fails, whole operation fails. When we do it separately for
	each LV, only faulty LVs fail. Running single lvm command for each LV
	may be little bit worse from perfomance point of view, but reliability
	is more important here.

	Bug-Url: https://bugzilla.redhat.com/1639360

	storage: don't use changelv() if there's equivalent lvm function
	Don't use general changelv() function when preparing volumes for
	deletion. Intead, use dedicated lvm functons, changeLVsTags() for
	ameding the tags and activateLVs() activating LVs.

	Bug-Url: https://bugzilla.redhat.com/1639360

	lvm: allow changeLVTags() to accept tuple of LVs
	Allow lvm.changeLVTags() to accept tuple of LVs, so that we can
	change multiple LVs in one call of this function. Rename the function
	to changeLVsTags() to reflect this change also in its name.

	Also adjust appropriate test to change tags on multiple LVs.

	Bug-Url: https://bugzilla.redhat.com/1639360

	lvm: remove unused function
	Function lvm.addLVTags() is not used anywhere and even not covered
	by tests. Remove this function.

	storage: Separate lvm commands
	Executing multiple lvm commands at once is not recommended
	and can lead to undefined result. For more detail explanation
	why this is possible while it's not advised, see
	https://www.redhat.com/archives/linux-lvm/2018-October/msg00017.html

	Bug-Url: https://bugzilla.redhat.com/1639360

2019-11-29  Amit Bawer  <abawer@redhat.com>

	storage, resourcemanager: Remove ununsed API flow for unregisterNamespace
	This patch removes the following methods along with the API:

	resourceManager.unregisterNamespace()
	resourceManager._unregisterNamespaceLocked()

	Removal of unused code improves CI coverage for resourceManager
	from 70% to 71%.

2019-11-29  Benny Zlotnik  <bzlotnik@redhat.com>

	hsm,image: deperecated methods
	copyImage, cloneStructure and syncData are now deprecated.
	ovirt-engine 4.4 by default, uses only SDM operations.
	- cloneStructure is replaced by a series of calls to createVolume
	- copyImage and syncData are replaced by SDM.copy_data

2019-11-29  Milan Zamazal  <mzamazal@redhat.com>

	spec: Use explicit versions
	Vdsm versions are retrieved automatically from git and tags.  This
	works well with most commits but doesn't cooperate very well with
	tagging.

	There are currently several problems with tagging:

	- When a commit is tagged with a new version, the commit is built with
	  the old version (before it is tagged) in Jenkins and must be
	  re-built after tagging again to change its version to the new one.

	- When a new stable branch is created, the very next master patch,
	  once it is committed, must be tagged with a new master version.  And
	  until this is done, all the master commits are built with the old
	  version in Jenkins and possibly elsewhere.

	- Builds from commits in different branches may have the same
	  VERSION-RELEASE (minus git hash) number until the master branch is
	  tagged.

	These are not very serious problems, but the versioning and releasing
	process is still somewhat messy.  We can improve it, by putting an
	explicit version number to the spec, making version bump commits on
	each release tagging, and ensuring that new versions are used even
	before the tags are created.  This approach, addressed by this patch,
	has the following advantages:

	- Releases are clearly identified by the version bump commits and they
	  are visible e.g. in ChangeLog.

	- Jenkins builds new releases with proper versions immediately and
	  they can be used as upstream sources without further steps.

	- There are no double-version or old-version-only builds for tagged
	  commits.

	- The version bump commit can be tagged immediately when branching,
	  without waiting for a followup commit to be merged.  We branch out
	  from the preceding commit.

	We also don't tag, unlike Engine, releases from master branch.  This
	is confusing, especially regarding resulting upstream and downstream
	versions, which deviate.  This patch doesn't address that problem, all
	what is needed to start making version change commits also for
	releases created from master.

	Backport-To: 4.3

2019-11-29  Marcin Sobczyk  <msobczyk@redhat.com>

	build: Drop 'vdsm_python_helpers.m4' macros
	Since we've dropped py2 support we don't need to have to m4 macros that
	helped us support/target different interpreter versions. Let's bury them
	deep in the ground.

	tox: Unify py3 jobs
	The py36 and py37 environments' definitions are identical - let's merge
	them into ones without any suffixes for clarity.

	tox: Drop py2 jobs
	This patch drops all py2 tox envs.

	build: Change all shebangs to python3

	autogen.sh: Drop support for Python 2
	This patch drops the possibility to configure the build tree with
	'autogen.sh' to support py2. All of the interpreter-selection options
	('--with-only-python', '--with-target-python', etc.) for 'autogen.sh'
	are being removed and everything works with py3 out of the box.

2019-11-28  Amit Bawer  <abawer@redhat.com>

	readme: Update the automation/check-patch.packages to fc30 and el8 version
	Old fc29 and el7 packages files are no longer part of the automation
	folder.

2019-11-28  Nir Soffer  <nsoffer@redhat.com>

	fileSD: Remove leading / in ioprocess client name
	Client name "/server:_path" is confusing; use "server:_path".

	fileSD: Ensure client_name is defined
	mountPoint is always inside /rhev/data-center/mnt so we don't need to
	check it. This ensures that client_name is always defined.

	Reported-by: Pavel Bar

2019-11-28  Milan Zamazal  <mzamazal@redhat.com>

	spec: Add a more meaningful changelog
	The current %changelog entry in vdsm.spec.in is not meaningful in any
	way.  Let's add a real ChangeLog to /usr/share/vdsm/doc and replace
	spec %changelog content with a reference to it.  This is better than
	including the whole ChangeLog, which is > 4 MB, in each of the built
	rpm files.

	We still keep some date etc. in %changelog to make rpm happy.

	Backport-To: 4.3

2019-11-28  Ales Musil  <amusil@redhat.com>

	net, nmstate, tests: Add workaround for dynamic func tests
	nmstate default behavior for DHCP has changed into
	non-blocking. Add basic sleep to get the DHCP response
	chance to arrive.

2019-11-27  Ales Musil  <amusil@redhat.com>

	net, nmstate, tests: Mark net_with_bond_test as nmstate
	Most of the test from net_with_bond_module can be run
	after nmstate PR [0].
	Only exception is:
	test_add_vlan_network_on_existing_external_bond_with_used_slave

	[0] https://github.com/nmstate/nmstate/commit/93accb61490e4216132de0d712e8267cc3ab8af9

	net, nmstate: Move validation for nic used by bond
	Nic usage validation was handled separetely for OvS
	and ifcfg. Move it to common validation function
	so it can be also used by nmstate.

	net, tests: Add unit tests for used nic validation

2019-11-27  Marcin Sobczyk  <msobczyk@redhat.com>

	build: Drop 'vdsm-tests' package
	'vdsm-tests' package has been broken an unused for a long time - let's
	drop it.

	Since 'tests/run_tests_local.sh' file is not part of dist package
	anymore, it is not built along with the 'all' target. 'check-*' targets
	from 'tests/Makefile.am' have an explicit dependency to this script,
	so they will do fine, but a small adjustment to 'contrib/shell_helper'
	has been added to make sure the script is generated before trying to run
	the tests.

	tests: Remove 'run_tests.sh' script
	The only use case for 'run_tests.sh' script was to run tests installed
	along with 'vdsm-tests' package. Since we're aiming at removal of this
	package, this script is no longer needed.

	tests: Remove 'makecert.sh' script
	In an effort to drop 'vdsm-tests' package, we need to get rid
	of build-time generated SSL keys and certificates created
	by 'tests/makecert.sh' script and the script itself.

	This patch removes the script and all the leftovers of global SSL keys,
	certs and contexts.

	tests: tests: Remove 'DEAFAULT_SSL_CONTEXT' dependency in 'constructClient'
	In an effort to drop 'vdsm-tests' package, we need to get rid
	of build-time generated SSL keys and certificates created
	by 'tests/makecert.sh' script and the script itself.

	This patch changes the 'constructClient' helper from
	'integration.jsonRpcHelper' module.

	This method used to take an 'ssl' argument which decided whether
	or not to pass an SSL context to 'constructAcceptor' and for socket
	creation. When called with 'ssl=True', the global 'DEAFAULT_SSL_CONTEXT'
	was used.

	The 'ssl' parameter has been changed to 'ssl_ctx', which allows to pass
	the context directly. This removes the dependency
	to 'DEAFAULT_SSL_CONTEXT'.

	'jsonrpcserver_test' and 'stomprpcclient_test' modules were clients of
	this helper, so they've been adapted to the changes.

	tests: Remove 'DEAFAULT_SSL_CONTEXT' dependency in 'constructAcceptor'
	In an effort to drop 'vdsm-tests' package, we need to get rid
	of build-time generated SSL keys and certificates created
	by 'tests/makecert.sh' script and the script itself.

	This patch changes the 'constructAcceptor' helper from
	'integration.jsonRpcHelper' module.

	This method used to take an 'ssl' argument which decided whether
	or not to pass an SSL context to 'MultiProtocolAcceptor'. When called
	with 'ssl=True', the global 'DEAFAULT_SSL_CONTEXT' was used.

	The 'ssl' parameter has been changed to 'ssl_ctx', which allows to pass
	the context directly. This removes the dependency
	to 'DEAFAULT_SSL_CONTEXT'.

	'stomp_test' module has been the only client of this helper, so it's
	been adapted to the new way of working.

	tests: Restore 'jsonRpcTests' module
	'jsonRpcTests' module has been hiding in the shadows for a long time -
	it's been moved from 'tests/' directory to 'tests/integration' in [1],
	but at the same time accidentally removed from a list of test modules
	to be run. Apart from usage of no longer existing 'stomp.Disconnected'
	exception [2] and some minor linting issues it seems to be doing fine
	though. Let's remove the obsolete tests and restore the rest
	of the module.

	[1] https://github.com/oVirt/vdsm/commit/0d591d352a81bddb3dc4e8b444fb1098900f057c
	[2] https://github.com/oVirt/vdsm/commit/e62cc7c4269a5a93d7c533bb740ca46ad3c0926a

	tests: sslhelper: Add 'create_ssl_context' helper
	In an effort to drop 'vdsm-tests' package, we need to get rid
	of build-time generated SSL keys and certificates created
	by 'tests/makecert.sh' script and the script itself.

	This patch adds a 'create_ssl_context' helper to 'integration.sslhelper'
	module that creates an SSL context.

	While at first sight it might seem unnecessary and an overkill,
	it serves an important role - to get rid of our own
	'sslutils.SSLContext' class in the end. Having an intermediary instead
	of using the class directly will ease the transition later.

	Along with the helper's introduction, usage of global
	'DEAFAULT_SSL_CONTEXT' has been replaced with a locally created context.

	tests: Helpers for generating SSL certificates
	In an effort to drop 'vdsm-tests' package, we need to get rid
	of build-time generated SSL keys and certificates created
	by 'tests/makecert.sh' script and the script itself.

	This patch adds two public helpers to 'integration.sslhelper' module:

	- generate_key_cert_pair: A context manager that creates a pair of
	  matching, tempfile-based SSL key and certificate

	- key_cert_pair: A session-scoped pytest fixture that uses
	  'generate_key_cert_pair' internally

	The commands for generating keys and certificates are extracted from
	'makecert.sh' script. Generated files are tempfile-based so they're
	automatically removed after the contextmanager regains control from
	'yield'.

	As a proof of correctness, the 'ssl_test' module has been refactored
	to use the new runtime-generated SSL certificates.

	py3: tests: Move 'client_test' module to tox
	In an effort to clean up 'tests' directory, we're moving
	'client_test' module to 'tests/lib/yajsonrpc' dir and run it with tox.

	The tests inside the module were skipped for py3, but it turns out that
	they're now working fine with this interpreter.

	The module has also been renamed to more accurately reflect the tested
	codebase.

	tests: make: Add 'run_tests_local.sh' dependencies
	Some 'check-' targets in 'tests/Makefile.am' require
	'tests/run_tests_local.sh' script to be working. This script is
	generated from 'tests/run_tests_local.sh.in' file. Thus, the check
	targets should depend then on the script target. This patch adds these
	dependencies.

2019-11-27  Miguel Duarte Barroso  <mdbarroso@redhat.com>

	automation: add the NetworkManager-1.20 repo
	Until CentOS 8.1 is released, we need to pull NetworkManager-1.20
	from copr.

	This is needed since oVirt-4.4 will target CentOS 8.1, that in turn
	will ship NM-1.20

2019-11-27  Marcin Sobczyk  <msobczyk@redhat.com>

	stomp: Remove 'Disconnected' exception definition
	'stomp.Disconnected' exception was introduced in [1], but it's
	single case of usage have been dopped in [2], let's drop it.

	[1] https://github.com/oVirt/vdsm/commit/5ac06445fe01f4791ada12c8a3be28f3255466a2
	[2] https://github.com/oVirt/vdsm/commit/e62cc7c4269a5a93d7c533bb740ca46ad3c0926a

	tests: Remove 'other' certificate
	The 'other' certificate created by 'tests/makecert.sh' doesn't seem
	to be used anywhere - let's drop it.

2019-11-26  Milan Zamazal  <mzamazal@redhat.com>

	hostdev: Provide mdev type descriptions
	The list of mdev device types in the Engine web UI should provide
	human readable descriptions of the types.  libvirt API doesn't expose
	such information [1].  However mdev devices can optionally expose some
	information about mdev types in `description' files in the
	corresponding mdev type directories.  Let's read it, if available, and
	add it to the other information about host devices returned to Engine.

	[1] https://www.redhat.com/archives/libvirt-users/2019-November/msg00017.html

2019-11-26  Amit Bawer  <abawer@redhat.com>

	tests, storage, resourcemanager: Remove VdsmTestCase and permutations
	Use pytest parametrization instead.

	tests, storage, resourcemanager: Remove legacy try-fail blocks
	Use pytest.raises() when looking for expected exceptions.

	tests, storage, resourcemanager: Use pytest asserts for tests

	tests, storage, resourcemanager: Use external logger

2019-11-26  Miguel Duarte Barroso  <mdbarroso@redhat.com>

	tests, net, nmstate: remove vlan MTU workaround
	After NetworkManager's fix for bug [0], the vlan iface MTU
	workaround can be safely removed from vdsm, as it is implemented
	directly in nmstate.

	[0] - https://bugzilla.redhat.com/1751079

2019-11-25  Nir Soffer  <nsoffer@redhat.com>

	misc: Remove namedtuple2dict
	namedtuple.asdict()[1] does what we need since python 2.7.

	[1] https://docs.python.org/3.6/library/collections.html#collections.somenamedtuple._asdict

2019-11-25  Eyal Shenitzky  <eshenitz@redhat.com>

	fileUtils: change createdirs tests behavior according to new makedirs
	Python 3.7 makedirs implementation doesn't force the intermediate
	directories in the given mode -
	https://bugs.python.org/issue19930

	This patch changes the tests for intermediate directories creation to
	support the new os.makedirs() behavior.

	storagetestlib: move get_umask() from oop test to storagetestlib
	Move get_umask() from outofprocess_test.py to storagetestlib.py
	in order to use it in other tests.

2019-11-25  Bell Levin  <blevin@redhat.com>

	CI: Add container env variable for systemd support

2019-11-22  Milan Zamazal  <mzamazal@redhat.com>

	hooks: Remove hostdev_scsi hook
	The corresponding, and more robust, functionality has been implemented
	in Engine.

2019-11-21  Pavel Bar  <pbar@redhat.com>

	storage, hsm: fix hsm.extendVolume division bug
	- There was a real issue in "extendVolume" method, that
	was discovered due to pylint division warnings.
	The new size was mistakenly rounded down instead of
	rounding up due to the integer division in Python2.
	Used the correct utils.round() implementation, which
	also eliminated the pylint warning itself.
	Added a test with 3 test cases.

2019-11-21  Vojtech Juranek  <vjuranek@redhat.com>

	tests: add more device mapper tests
	This patch adds more device mapper tests. These tests increase test
	coverage from 46% to 80% and uncovered are mostly only error flows
	when exception is thrown. As the tests use zero device mapping, the
	tests (and code coverage) won't be unfortunatelly run in CI.

	tests: add test for removing dm mapping
	Add test for removing device mapper mapping. The test uses real mapping
	created by zero_dm_device fixture introduced in previous commit.

	The device mapper doesn't work properly on Docker, it hands during
	creation of the zero target mapping, waiting for a semaphore:

	    ioctl(3, DM_DEV_CREATE, 0x55eda20de290) = 0
	    ioctl(3, DM_TABLE_LOAD, 0x55eda20de270) = 0
	    ioctl(3, DM_DEV_SUSPEND, 0x55eda20de170) = 0
	    semget(0xd4de04a, 1, 0)                 = 0
	    semctl(0, 0, GETVAL, 0xffffffff)        = 2
	    semop(0, [{0, -1, IPC_NOWAIT}], 1)      = 0
	    semop(0, [{0, 0, 0}], 1

	The tests times out in the container, so we have to skip this test on
	oVirt CI and Travis.

	tests: add dm mapping fixture
	Add a fixture which creates a test device mapper mapping. This mapping
	uses zero target, which is supposed to be used for device mapper tests
	and acts like /dev/zero. For now, the size of the divice is fixed to
	1 GiB.

	The tests using this fixture need to run with root privileges as
	dmsetup utility needs root privileges. The fixture will be used in
	subsequent commit for real device mapper tests.

2019-11-21  Michal Skrivanek  <michal.skrivanek@redhat.com>

	spec: require kernel for CVE-2019-14835

2019-11-20  Miguel Duarte Barroso  <mdbarroso@redhat.com>

	net, nmstate: purge vlan from previous southbound device
	When a VLANed network is moved from one southbound interface
	to another, the existing VLAN device is not deleted.

	This patch addresses that, by updating the nmstate API translation
	to generate a state that also features the previous interface
	deletion.

	net, tests, nmstate: Vlan iface is not removed on network updates
	This test mimics the following scenario:
	- t = 0: {net1: {nic: eth0, vlan: 100}}
	- t = 1: {
	           net1: {nic: eth1, vlan: 100},
	           net2: {nic: eth0, vlan: 100},
	         }

	The goal of this scenario is to assure that when a VLAN is moved
	from an interface to another, and, a new VLANed network is created
	over the old base interface, the old vlan interface is not deleted.

	net, tests, nmstate: Vlan iface is not removed
	When a vlan interface is moved from one southbound interface
	to another, the 'old' vlan interface leaks, when nmstate is used
	as a networking backend.

	This patch exposes this behavior via an xfailed functional test.

2019-11-20  Milan Zamazal  <mzamazal@redhat.com>

	py3/storage: Use `msg' instead of `message' in exceptions
	This is to not confuse Python 3 linters with presence of the obsolete
	`message' attribute in exceptions.

	py3/tests: Use `msg' instead of `message' in exceptions
	This is to not confuse Python 3 linters with presence of the obsolete
	`message' attribute in exceptions.

2019-11-20  Pavel Bar  <pbar@redhat.com>

	constants, hsm: Use size constants for MiB, GiB
	- Use "MiB" and "GiB" constants instead of:
	    a) Magic numbers.
	    b) Other constants.
	    c) Other variations to receive the constants integer values.

	constants, blkdiscar: Use size constants for MiB
	- Use "MiB" constant instead of magic numbers.

	imports: sorting imports in alphabetical order

2019-11-20  Vojtech Juranek  <vjuranek@redhat.com>

	tests: move broken_on_ci into marks module
	In subsequent patches we will need broken_on_ci marker. Move existing
	one into marks module. Default reason is provided, but it can be
	specifed in each module so that we can provide more details why the
	tests don't work on CI.

2019-11-20  Edward Haas  <edwardh@redhat.com>

	net, tests: Use 'ovirt' instead of 'ovirtorg' image prefix
	The network functional test container is now maintained on quay.io
	under the 'ovirt' organization.

2019-11-20  Ales Musil  <amusil@redhat.com>

	net, tests: Mark OvS IPv6 dynamic tests as xfail
	Bug-Url: https://bugzilla.redhat.com/1773471

2019-11-19  Nir Soffer  <nsoffer@redhat.com>

	travis: Do not allow failures in el8 build
	At this point vdsm tests on el8 should be stable.

	tests: Fix el8 tests on travis
	For some reason running "sleep" show "/usr/bin/sleep" in the
	/proc/<pid>/cmdline. Lets try to always run "/usr/bin/sleep".

2019-11-19  Amit Bawer  <abawer@redhat.com>

	lvm: Set devices/scan_lvs=0 config option for Vdsm lvmlocal.conf
	This is the current default behavior for lvm2 2.02 and 2.03, adding
	this option is a precaution in case it is ever changed.

2019-11-19  Eyal Shenitzky  <eshenitz@redhat.com>

	lvm: add TODO for removing read_only and locking_type
	locking_type=1 is the default in RHEL-8 and Fedora 31 but
	in RHEL-7 and Fedora < 31 locking_type is still needed.

	locking_type=4 isn't needed in RHEL-8 since LVM doesn't
	modifying the metadata from read commands anymore bug -
	https://bugzilla.redhat.com/1553133.

	In RHEL-8 and Fedora > 31 LVM will look at the deprecated
	locking_type value, and will translate it to an equivalent
	command line option and just print a warning message and
	ignore it.

	This patch adds a TODO for removing the read_only attribute
	and changing the log level of the failed validation for the
	locking_type in the HSM.

	lvm: set hints="none" in lvmlocal.conf
	In RHEL-8, lvm remember which devices are PVs so that lvm can
	avoid scanning other devices that are not PVs.
	But if you are creating/removing PVs from other hosts,
	then the hints might be wrong.

	Setting hints to None to avoid that situation.

	lvm: set hints="none" in lvm.conf
	In RHEL-8, lvm remember which devices are PVs so that lvm can
	avoid scanning other devices that are not PVs.
	But if you are creating/removing PVs from other hosts,
	then the hints might be wrong.

	Setting hints to None to avoid that situation.

2019-11-18  Marcin Sobczyk  <msobczyk@redhat.com>

	ci: Run 'check-patch.install' sub-stage on el8
	We use 'check-patch.install' sub-stage to verify whether built RPMs are
	installable without errors on different distros. This patch adds an el8
	run of this sub-stage to CI.

	ci: Add missing 'glusterfs' repo
	In 'check-patch.install' ci sub-stage we install all freshly built VDSM
	RPMs. To properly install 'vdsm-gluster' on el8 we need some glusterfs
	packages that are not part of base repos. This patch adds the missing
	glusterfs repo.

	spec: Fix 'hook-httpsisoboot' dependency
	'hook-httpsisoboot' hook was introduced in [1] to allow using
	https-served ISOs without the need to upload them to storage domains.
	Since this feature was introduced to qemu-kvm-rhev somewhere in el7
	lifecycle, the spec required specific version of this package.
	For el8 we can use it right away and given that 'vdsm' package
	requires newer version of 'qemu-kvm', we can drop this requirement
	completely.

	[1] https://github.com/oVirt/vdsm/commit/59534fdf4e23d6719aa81f832de39ba37becf25eq

	spec: Fix 'hook-filenject' dependency
	'python-libguestfs' dependency of 'hook-fileinject' package is invalid
	for py3 builds. This patch changes it to either 'python-libguestfs' or
	'python3-libguestfs' depending on target interpreter version.

	ci: Change dnf/yum detection mechanism
	This patch changes the way we check whether to use yum or dnf - instead
	of relying on distro name we simply check for availability of
	'/usr/bin/dnf' executable and fallback to yum if it's not available.

2019-11-18  Eyal Shenitzky  <eshenitz@redhat.com>

	fileutil_test: remove unneeded classes
	 - remove unneeded classes
	 - use pytest parametrize instead of @permutation
	 - renamed tests to contain more context

2019-11-18  Vojtech Juranek  <vjuranek@redhat.com>

	tool: use systemctl for lvm configuration
	We have systemctl module which provides show() functions which is
	tested. Don't repeat this code, which is moreover not tested, and
	use this standard module for obtaining information about systemd
	units.

2019-11-18  Pavel Bar  <pbar@redhat.com>

	constants: Add a dedicated "units" module for size constants.

	constants: Use size constants for fakesanlock
	- Use ALIGNMENT_XX constants for alignment sizes.
	- Use BLOCK_SIZE_YYY constants for sector sizes.

2019-11-18  Amit Bawer  <abawer@redhat.com>

	storage, filevolume: Use iprocess for writing volume metadata
	Using the old blocking file writing could hang Vdsm for long periods.
	We facilitate the ioprocess error/timeout handling for such cases.

2019-11-18  Nir Soffer  <nsoffer@redhat.com>

	image: Remove unused Image.sparsify()
	Image.sparsify() was added in 3.6 as a GSOC project, but was never used
	by engine or other supported client. In 4.1 we added
	SDM.sparsify_volume() which is used by engine, but we kept the old
	sparsify in case engine will want to use it.

	Moving to RHEL 8 is an opportunity to drop the dead code.

	tests: Mark loopback 4k tests as xfail on el8
	This test was skipped on el7 because --sector-size option was not
	available, and was marked as expected failure on fedora, where this
	option is available, but using it fail randomly in oVirt CI. In el8
	--sector-size is available but fails randomly as well.

2019-11-17  Eyal Shenitzky  <eshenitz@redhat.com>

	fileutil_test: use pytest assert in all tests

2019-11-17  Vojtech Juranek  <vjuranek@redhat.com>

	tool: disable lvmetad only when it's present
	Since lvm2-2.03 lvmetad was removed from lvm and is not used any more.
	Add ad check into lvm configurator, which checks whether lvmetad is
	present and if not, skip its configuration.

	Systemd masks even service which doens't exists, so current configurator
	works also with lvm2-2.03, but configuring service which doens't exists
	is ugly can lead to some unexpected bugs.

	tool: add TODO for removing lvmetad from lvm configurator
	lvmetad is not suppored since lvm2-2.03 which is present in RHEL8 and
	Fedora 31. We still support Fedora 30, so we still have to keep
	disabling of lvmetad, but once we stop supporting it, it should be
	removed. Addd TODO to appropriate parts of lvm configurator to remove
	it once we don't support Fedora 30.

	Running this code on Fedora 31 or RHEL8 seems to be harmless, as
	systemd masks the service no matter if it exists or not, but it
	would be nice to skip this configuration if not needed. This will be
	done in subsequent patch.

	lvm: suppress warning about unknown event_activation option
	lvm2-2.03 adds global/event_activation which is enabled by default.
	As it can result into data corruption in our case, we disabled it in
	previous patch. As we still support Fedora 30, which doesn't recognize
	this option and prints a warning about it in every lvm command, suppress
	this warning. It can be removed once we don't support Fedora 30.

	Example warning from vdsm log:

	  2019-11-12 10:17:34,145-0500 WARN  (monitor/3278f18) [storage.LVM] Command ['/usr/sbin/lvm', 'lvchange', '--config', 'devices {  preferred_names=["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3  filter=["a|^/dev/mapper/3600140524fa508b0f2f4884b361a9489$|", "r|.*|"] } global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 } backup {  retain_min=50  retain_days=0 }', '--autobackup', 'n', '--available', 'y', '3278f181-c6ad-4d1b-9ac5-4f9489aad371/leases'] succeeded with warnings: ['  Configuration setting "global/event_activation" unknown.', '  Configuration setting "global/event_activation" unknown.', '  Configuration setting "global/event_activation" unknown.'] (lvm:385)

	lvm: disable event activation
	lvm2-2.03 has, besides other changes, removed support for lvmetad,
	which is not used anymore and introduced event activation, which
	is enabled by default. To prevent unwanted lvm activation of newly
	added devices, which can lead to data corruption, disable event
	activation. Also remove lvmetad configuration, which is not needed
	anymore.

	As we still support Fedora 30, which use lvm2-2.02, we still have to
	support old lvmetad configuration. As lvm ignores options which it
	doesn't recognize, we can keep all configuration in the same file and
	use one config for all versions.

	If lvm finds unknow option, it ignores it and prints a warning for it.
	The warnings can be disabled by setting config/checks=0 in lvm.conf.

	The warnings are printed out only on Fedora 30/CentOS 7, on RHEL8
	local/use_lvmetad seems to be silently ignored even when
	config/checks=1.

2019-11-17  Ales Musil  <amusil@redhat.com>

	net, func, tests: Enter the shell just before running the tests

2019-11-17  Miguel Duarte Barroso  <mdbarroso@redhat.com>

	packaging: require nmstate on 4.4

2019-11-17  Ales Musil  <amusil@redhat.com>

	net, nmstate: Fix getting wrong attribute from setup networks
	Fix getting the 'bond' attribute instead of 'bonding'.
	Add a functional tests that covers this scenario.

	net, nmstate, tests: Wait for dhcp responses
	Wait for dhcp responses by polling the iface tracking
	files existence. This is needed because for some
	tests it takes longer for the response to arrive.
	This resulted into unstable dynamic tests
	because if the dhcp response was not handled in time,
	the gateway assert failed.

	The polling has timeout of 5 seconds.

	This does not affect production code by any means.
	From the prodution code point of view there is no time constraint
	for the event to arrive.

2019-11-17  Bell Levin  <blevin@redhat.com>

	net: Do not delete bridge when slave is down
	When performing an ifdown on an unenslaved nic, which it's ifcfg
	file contains a bridge entry, the bridge is wrongly removed.
	This is a new behavior on el8, and does not reproduce on el7.

	When detecting a nic that is to be enslaved to a nic, initialize
	it's configuration through iproute2 instead of ifdown.

2019-11-17  Eyal Shenitzky  <eshenitz@redhat.com>

	udev: use compat.get_args_spec() instead inspect.getargspec()
	inspect.getargspec() is deprecated since Python 3.0.
	Using compat.get_args_spec() to make udev compatible to both
	Python-2.7 and Python-3.7.

2019-11-17  Ales Musil  <amusil@redhat.com>

	net, tests: Add option to run functional tests with nmstate backend
	By running container with:
	`sudo bash -c "TEST_NMSTATE=1 ./tests/network/functional/run-tests.sh"`

	Failing tests:
	test_create_network_over_an_existing_unowned_bridge
	Some tests in random order from dynamic_ip_test.py

2019-11-15  Marcin Sobczyk  <msobczyk@redhat.com>

	config: ssl: Conform to crypto policies
	This patch is a squash of:

	https://gerrit.ovirt.org/#/c/104287
	https://gerrit.ovirt.org/#/c/104288
	https://gerrit.ovirt.org/#/c/104467

	Bug-Url: https://bugzilla.redhat.com/1179273

2019-11-15  Vojtech Juranek  <vjuranek@redhat.com>

	tool: fix module config help
	When switching to py3, we started to pass iterator into configurator
	help string, so the help output looks like this now:

	    $vdsm-tool is-configured -h
	    usage: vdsm-tool is-configured [-h] [--module STRING]

	    optional arguments:
	      -h, --help       show this help message and exit
	      --module STRING  Specify the module to run the action on (e.g
	                       <dict_keyiterator object at 0x7f8f83f14048>). If non is
	                       specified, operation will run for all related modules.

	Fix it by switching to list() instead of using six.iterkeys().

2019-11-15  Marcin Sobczyk  <msobczyk@redhat.com>

	tests: ssl: Remove unused test data
	Neither 'KEY' nor 'CERTIFICATE' seem to be used in the module,
	let's drop them.

2019-11-14  Nir Soffer  <nsoffer@redhat.com>

	travis: Remove Fedora 29 and CentOS 7 builds
	CentOS 7 is not supported in 4.4, and Fedora 29 wil be EOL in 2 weeks.
	We don't want to waste time and resources on testing them.

2019-11-14  Edward Haas  <edwardh@redhat.com>

	net, tests: Rename network functional tests container
	Rename the Dockerfile to fit the name template of the existing
	dockerfile names.
	Dockerfile: Dockerfile.func-network-centos-8
	Container name: ovirtorg/vdsm-test-func-network-centos-8

2019-11-14  Amit Bawer  <abawer@redhat.com>

	multipath: Use 'overrides' config section instead of 'devices'
	Since current 4.4 Vdsm release is not intended to support
	el7 platform, we can remove the old 'all_devs' config option
	with the 'devices' section and use the newer 'overrides' section
	already supported in el8 and fc29 or later.

2019-11-14  Steven Rosenberg  <srosenbe@redhat.com>

	v2v: GetStats response message fails in Json
	Importing VMWare VM to an EL8.1 Host fails to import
	due to a binary value that was not being converted
	to a string. This caused the Host.getStats response
	message to fail at the Json layer.

	Adding string conversion to the description field
	mitigated the failure. The issue is isolated to
	the current master 4.4 branch.

	Bug-Url: https://bugzilla.redhat.com/1770889

2019-11-14  Yuval Turgeman  <yturgema@redhat.com>

	packaging: move dracut config to /usr/lib
	libdir is expanded to /usr/lib64, while dracut expects its configuration
	files to be placed under /usr/lib

	Bug-Url: https://bugzilla.redhat.com/1756944
	Bug-Url: https://bugzilla.redhat.com/1760262

2019-11-14  Pavel Bar  <pbar@redhat.com>

	storage, mailbox: fix pylint warnings (division) & bug
	Fixing a warning produced by "pylint --py3k" command.
	Warning: "division w/o __future__ statement (old-division)".
	Actually this warning exposed a real bug in "wait_timeout()"
	function that didn't work correctly in Python2 when passing
	to it an odd integer "monitor_interval" parameter.

	Solution:
	1) Added a "division" import from "__future__".
	This turns "/" into floating point division in both Python versions.
	Anyway, modifying the division of 2 integer numbers, both known in
	compile time, into a plain floating point number.
	2) Added scenarios to "wait_timeout()" test.
	The test was missing an odd integer scenario that in the old code
	would fail due to integer division in Python2.
	Rewrote the test using parameters.

	storage, formatconverter: fix pylint warnings (division)
	Fixing a warning produced by "pylint --py3k" command.
	Warning: "division w/o __future__ statement (old-division)".

	Solution:
	1) Added a "division" import from "__future__".
	2) Updated the division to be integer division in both
	Python 2 and Python 3.

	storage: minor improvements & fixes (typos)
	- Fixed typos in comments.
	- Removed redundant parentheses.

2019-11-14  Edward Haas  <edwardh@redhat.com>

	net, tests: Run network functional tests in a container
	Introduce a dockerfile and a script that runs the network functional
	tests in a container.

	The script needs to be run as a privileged user.

	To build the container, under the vdsm/docker folder, run:
	sudo podman build \
	  --rm \
	  -t \
	  ovirtorg/vdsm-network-functest-centos-8 \
	  -f Dockerfile.network-functest-centos-8 \
	  .

	Usage examples:
	- Run tests based on linux-bridge
	`sudo ./tests/network/functional/run-tests.sh`

	- Open the container shell without executing the test run.
	`sudo ./tests/network/functional/run-tests.sh --shell`
	- At the container shell, you can run the tests:
	```
	pytest \
	  -x \
	  -vv \
	  --log-level=DEBUG \
	  --target-lib \
	  --skip-stable-link-monitor \
	  -m legacy_switch tests/network/functional
	```
	- Run tests based on ovs-switch
	`sudo bash -c "TEST_OVS=1 ./tests/network/functional/run-tests.sh"`

2019-11-14  Marcin Sobczyk  <msobczyk@redhat.com>

	build: Make Python 3 the default version
	This change makes py3 the default interpreter version. The consequence
	of this change is that whenever you simply run:

	 ./autogen.sh

	your build tree will be configured to target py3 interpreter version
	(as in building RPMs with py3) and having py2 only as an option (if it's
	available in your OS).

	stdci: Remove regular el7-based sub-stages from CI
	This patch drops all regular el7-based sub-stages from our CI.
	Functional networking tests are untouched by this change.

2019-11-14  Edward Haas  <edwardh@redhat.com>

	nmstate, mtu: Add slave to a bond with a non default mtu
	Support the ability to add a new slave to a bond and inherit the bond
	mtu.

2019-11-13  Vojtech Juranek  <vjuranek@redhat.com>

	tests: add tests for failing scsi_id
	Add tests for multipath.get_scsi_serial() when serial ID is not
	returned. Patch conatins two tests for following scenarios:
	* scsi_id is called on non-existing device or divice which hasn't
	  ID_SERIAL, returned code is zero, but output doesn't contain
	  ID_SERIAL field.
	* scsi_id fails completely, there's no output and returned code is
	  non-zero.

2019-11-13  Nir Soffer  <nsoffer@redhat.com>

	tests: Remove fragile qemu-img map test case
	qemu-img 4.1.0-5 changed the behaviour when writing part of a qcow2
	cluster, breaking tests writing the first 4k of a cluster. Change the
	test to test writing full cluster which is less likely to change.

2019-11-13  Miguel Duarte Barroso  <mdbarroso@redhat.com>

	automation: remove fc29 jobs
	The check-network stage will stick to fc29 because there are no
	lago pkgs for fc30 - lago is a required pkg for that target.

	This issue is being tracked on [0].

	[0] - https://ovirt-jira.atlassian.net/browse/OST-135

2019-11-13  Amit Bawer  <abawer@redhat.com>

	storage, tests: Use mailbox constants in mailbox_tests
	Replace raw offsets 0x40, 0x1000 with MESSAGE_SIZE and MAILBOX_SIZE
	respectively.

	storage, mailbox: Concat bytes with bytes for outbox message cleanup
	This resolves the TypeError exception met in previous patch for python3
	test_roundtrip in mailbox_test, so the xfail marker is
	removed for this test.

	tests, storage: Add HSM outbox messages clearing check to roundtrip test
	In this part we verify that the HSM mailer outbox is back to a cleared
	state, meaning that we have completed the mailbox cycles up to the
	messages cleanup in HSM outbox.

	The message clearing flow cycles as following:

	HSM to SPM: extend request
	SPM to HSM: extend reply
	HSM to SPM: clean message
	SPM to HSM: clean response
	HSM: outbox clearing

	tests, storage: Add roundtrip test to mailbox_test
	Added test covers flow of the HSM-to-SPM extension request message,
	the SPM-to-HSM extension reply message and the resulting clearing of
	the handled message in HSM outbox.

	It also exposes following exception met for storage-py3 testing

	  File "vdsm/lib/vdsm/storage/mailbox.py", line 330, in _handleResponses
	    MESSAGE_SIZE * "\0" + self._outgoingMail[start +
	TypeError: can't concat str to bytes

	So it is currently marked with xfail_python3 to be repaired in next patch.

	tests, storage: Amend MAILER_TIMEOUT to 10 seconds
	Also amend test_send_recieved to use the amended timeout
	instead of using the MONITOR_INTERVAL * 10 = 2 seconds timeout
	which may result in a flakey behavior during CI rush hours.

2019-11-13  Edward Haas  <edwardh@redhat.com>

	net, tests: Share the bond mapping helpers with the func tests
	When running the network functional tests against the lib, there may be
	a need to map and inject the bond options defaults.

	The integration tests already handle these cases in the test session
	setup, this patch shares them with the functional tests.

	net, tests: Make functional.utils dependency optional
	When running the functional network tests against the lib, there is no
	need for the VDSM client service proxy which is implemented in
	functional.utils.

	In order to run the tests against the lib without running VDSM, the
	dependency on functional.utils is made optional.

	common, hooks: Remove libvirt dependency from hooks
	The common.hooks module is currently dependent on the libvirt package.
	This dependency is unwanted for some users of vdsm.common (like
	vdsm-network).

	Specifically, testing VDSM functional network tests without installing
	VDSM becomes imposible.

	The currently dependency is removed by replacing the constant with a 0.

2019-11-13  Nir Soffer  <nsoffer@redhat.com>

	tox: Use the TIMEOUT environment variable
	Replace hardcoded 600 with {env:TIMEOUT:600}.

	tox: Increase storage tests timeout on Jenkins
	Recently we see random timeouts in storage tests on Jenkins. Looks like
	we don't have enough resources and the slaves are overloaded. Increase
	the storage timeout to 1200 seconds when running on Jenkins.

	rhv_build: Enable ovirt-imageio
	ovirt-imageio was disabled when we did not packages for RHV, but we have
	the package for a while (upstream), and we should have a package for
	RHEL soon.

	Bug-Url: https://bugzilla.redhat.com/1771051

2019-11-13  Edward Haas  <edwardh@redhat.com>

	net, tests: Relocate ovsnettestlib module to integration.ovs

2019-11-12  Vojtech Juranek  <vjuranek@redhat.com>

	tests: allow change content of fake scsi_id
	In follow-up patches we are going to add more tests for scsi_id and
	e.g. to mimic command failure we need to be able to modify fake script.
	Provide fake executable to the tests and allow them to modify its
	content.

	tests: fix typo in scsi_id fake output

2019-11-12  Amit Bawer  <abawer@redhat.com>

	storage, mailbox: Fix free message slot lookup in HSM mail monitor
	Current logic initializes freeSlot index as False and checks for its vacancy with
	a negation condition. This creates a bug falsly ignoring zero index as a possible slot,
	not allowing to utilize it for an added message.

	This issue was reflected in a raised exception of full active messages list for the
	priorly added test_fill_slots(), so the xfail marker is removed here for this test.

	tests, storage: Add test for full slots case in HSM mailbox
	Current test exposes incapability of HSM mailer to handle more
	than (MESSAGES_PER_MAILBOX - 1 = 62) messages, due to a bug fixed
	in the next patch.

	This raises a worker thread exception:

	Traceback (most recent call last):
	  File "vdsm/lib/vdsm/storage/mailbox.py", line 473, in _run
	    self._handleMessage(message)
	  File "vdsm/lib/vdsm/storage/mailbox.py", line 411, in _handleMessage
	    raise RuntimeError("HSM_MailMonitor - Active messages list full, "
	RuntimeError: HSM_MailMonitor - Active messages list full, cannot add new message

	So in this patch the test is marked as xfail, to be fixed in next patch.

	storage, mailbox, py3: Amend empty message check in SPM mail monitor
	First byte for an empty message should be only null, b"\0".
	Other checked options are invalid and incompatible with Python 3.

	Also adding a test for skipping empty request messages to cover
	the modified code section.

	storage, mailbox, py3: Amend empty message check in HSM mail monitor
	First byte for an empty message should be only null, b"\0".
	Other checked options are invalid and incompatible with Python 3.

	Also adding a test for skipping empty reponse messages covering the
	modified check.

	storage, mailbox: Add packed_checksum wrapper for serialized mailbox checksum
	Also add tested values to checksum sanity test.

	storage, mailbox: Replace numBytes parameter by CHEKSUM_BYTES constant for checksum
	We always use the 4 bytes constant for checksum passed length, so no need to pass
	it explicitly.

	The additional result trimming into 32 bit is kept in a more readable format
	for sake of robustness in case we might use the checksum over input data summing
	up to more than a 4 bytes value.

	storage, mailbox: Fix deprecation warning for checksum function
	python 3 deprecated array.fromstring(), resulting in the following
	warning met during mailbox tests:

	vdsm/storage/mailbox.py:603: DeprecationWarning:
	fromstring() is deprecated. Use frombytes() instead.
	  n = checksum(data, CHECKSUM_BYTES)

	This patch replaces the old array API usage with bytearray.

	storage, tests: Add sanity test for mailbox checksum function
	Adding test for checksum inputs of interest and their expected results.

	Also get rid of the ineffective consistency test checking a deterministic
	checksum function with same random inputs.

	tests, storage: Replace pool mocks with FakeSPMMailer and FakePool classes

	tests, storage: Add extend_message() function to mailbox tests
	Since the extend message is repeating throughout tests, only with
	different requested sizes, it is useful to add a function which
	generates the message payload formatted with the requested size.

	Also amending requested sizes to multiples of 128MB to reflect the
	actual use casing of volume extending.

	tests, storage: Use volume_data() function instead of a fixed VOL_DATA
	This will allow tests to choose if they need a fixed volume ID or
	a customized one.

2019-11-11  Bell Levin  <blevin@redhat.com>

	net, CI: bump pytest version
	pytest 4.3< is not compatible with the new attrs package
	and should be upgraded.

2019-11-11  Milan Zamazal  <mzamazal@redhat.com>

	virt: Add memory hot(un)plug XML to API
	Vdsm code already supports XML based memory hot plug and unplug and
	accepts `xml' parameter, although it is not used by Engine yet.  Let's
	expose the API for future Engine versions, together with making the
	old memory device parameters optional.

	virt: Store balloon target to metadata
	Balloon target may be modified using setBalloonTarget API call.  Let's
	store the current value to metadata to restore it properly later when
	needed.

	hostdev: Add a comment about multiple devices in hot plug
	See the commit message in ae4a5872897e8f68de126ee5b870dcc2333ae82e.

	hostdev: Add an explanation comment about failed hot plug cleanup
	See https://gerrit.ovirt.org/c/42661/16/vdsm/virt/vm.py#2085

	hostdev: Remove support for mdev_type custom property
	It's no longer needed: since Engine 4.2.6, Engine sends device XML
	instead.

	virt: Remove Balloon device
	We support only Engine >= 4.2 now, so there is no need to keep the
	device machinery for balloons and we can remove Balloon class.

	virt: Simplify Graphics device class
	Only XML Engine (>=4.2) is supported now.  That means we needn't
	support the legacy device class and related legacy functionality
	anymore.  The bits from it we still need can be kept in separate
	functions and in a simple class, to keep device setup and teardown
	working properly, out of the complex device tracking and processing
	mechanism.

2019-11-08  Vojtech Juranek  <vjuranek@redhat.com>

	multipath: ignore scsi_id failures
	When the block SD is detached, we disconnect iSCSI targets.
	Corresponding multipath device is removed once all the paths to the
	multipath device are lost, i.e. once we remove all backing devices,
	multipath device is removed as well. However, this is not the case
	when multipath is still open by some process. In such case removing
	of multipath device fails and multipath doesn't retry this operation.
	As a result, we can end up with multipath device which has no paths.
	For such device scsi_id command fails and most of the block SD
	operations would fail as well as a consequence of this failure.

	In the past, when we used deprecated misc.execCmd() for executing
	scsi_id, we ignored such errors silently. We replaceed deprecated
	execCmd() by commands.run() in commit

	  commit f78db940d1e5b3ffd2da58835942c9e333159f5b
	  Author: Vojtech Juranek <vjuranek@redhat.com>
	  Date:   Thu Sep 19 14:35:30 2019 +0200

	commands.run() throws an exception when executed command returns
	non-zero return code. Therefore by switching to commands.run() we
	don't ignore scsi_id failures any more and if there's multipath
	device without any path, related block SD operation fails now.

	Until we figure out how to detach block SD properly and don't leave
	any multipath devices without any path in the system, we should continue
	to ignore such devices as it seems harmless. Catch the cmdutils.Error
	when running scsi_id, log a debug message for such device and return
	empty output as we did in the past.

	Bug-Url: https://bugzilla.redhat.com/1766595

2019-11-08  Edward Haas  <edwardh@redhat.com>

	net, func tests: Pass unicode to ipaddress.IPv6Interface
	ipaddress.IPv6Interface is used in the functional tests to pack the IPv6
	address. The class expects an unicode string but on python2 a
	string/bytes is passed instead.

	Fix the issue by using six.text_type.

2019-11-08  Eyal Shenitzky  <eshenitz@redhat.com>

	sdm_copy_data_test: use is_alive() instead of isAlive()

2019-11-08  Nir Soffer  <nsoffer@redhat.com>

	multipath: Log time for resizing devices
	Slow device resize can cause failures in other parts of the system. We
	want to make it easy to debug issue caused by slow resizes.

	Here are example logs from OST run:

	2019-10-25 14:24:08,813-0400 INFO  (jsonrpc/6) [storage.StorageDomainCache] Refreshing storage domain cache (resize=True) (sdc:80)
	2019-10-25 14:24:08,813-0400 INFO  (jsonrpc/6) [storage.ISCSI] Scanning iSCSI devices (iscsi:442)
	2019-10-25 14:24:09,058-0400 INFO  (jsonrpc/6) [storage.ISCSI] Scanning iSCSI devices: 0.24 seconds (utils:390)
	2019-10-25 14:24:09,059-0400 INFO  (jsonrpc/6) [storage.HBA] Scanning FC devices (hba:60)
	2019-10-25 14:24:09,316-0400 INFO  (jsonrpc/6) [storage.HBA] Scanning FC devices: 0.26 seconds (utils:390)
	2019-10-25 14:24:09,357-0400 INFO  (jsonrpc/6) [storage.Multipath] Resizing multipath devices (multipath:104)
	2019-10-25 14:24:09,359-0400 INFO  (jsonrpc/6) [storage.Multipath] Resizing multipath devices: 0.00 seconds (utils:390)
	2019-10-25 14:24:09,360-0400 INFO  (jsonrpc/6) [storage.StorageDomainCache] Refreshing storage domain cache: 0.54 seconds (utils:390)

	We can understand now how much time was spent in
	StorageDomainCache.refreshStorage(), and where time was spent.

	Bug-Url: https://bugzilla.redhat.com/1765684

	iscsi: Log time for scanning iSCSI devices
	Slow iSCSI rescan can lead to failures in other parts of the systems. We
	want to make it easy to debug issues caused by slow iSCSI rescans.

	Here are example logs from OST run:

	2019-10-25 14:24:08,813-0400 INFO  (jsonrpc/6) [storage.StorageDomainCache] Refreshing storage domain cache (resize=True) (sdc:80)
	2019-10-25 14:24:08,813-0400 INFO  (jsonrpc/6) [storage.ISCSI] Scanning iSCSI devices (iscsi:442)
	2019-10-25 14:24:09,058-0400 INFO  (jsonrpc/6) [storage.ISCSI] Scanning iSCSI devices: 0.24 seconds (utils:390)
	2019-10-25 14:24:09,059-0400 INFO  (jsonrpc/6) [storage.HBA] Scanning FC devices (hba:60)
	2019-10-25 14:24:09,316-0400 INFO  (jsonrpc/6) [storage.HBA] Scanning FC devices: 0.26 seconds (utils:390)
	2019-10-25 14:24:09,360-0400 INFO  (jsonrpc/6) [storage.StorageDomainCache] Refreshing storage domain cache: 0.54 seconds (utils:390)

	Bug-Url: https://bugzilla.redhat.com/1765684

	hba: Log time for scanning FC devices
	Slow fc-scan can lead to failures in other parts of the systems. We want
	to make it easy to debug issues caused by slow fc-scan calls.

	Here are example logs from OST run:

	2019-10-25 14:24:08,706-0400 INFO  (jsonrpc/5) [storage.StorageDomainCache] Invalidating storage domain cache (sdc:74)
	...
	2019-10-25 14:24:09,059-0400 INFO  (jsonrpc/6) [storage.HBA] Scanning FC devices (hba:60)
	2019-10-25 14:24:09,316-0400 INFO  (jsonrpc/6) [storage.HBA] Scanning FC devices: 0.26 seconds (utils:390)
	...
	2019-10-25 14:24:09,360-0400 INFO  (jsonrpc/6) [storage.StorageDomainCache] Refreshing storage domain cache: 0.54 seconds (utils:390)

	Bug-Url: https://bugzilla.redhat.com/1765684

	sdc: Log time for slow operations and state changes
	StorageDomainCache slow operations may block storage operations in
	unrelated storage domain, and it probably the worst part of vdsm
	storage.

	To understand better how much time is spent on slow operations, and when
	we clear or invalidate the cache, add log info for all important state
	changes, and log the time spent in slow operations.

	Slow operations:

	- refreshStorage - does iSCSI/FC scans, that can block up to 30 seconds,
	  and invoke multipath.resize_devices() that may be slow with large
	  number of devices.

	- _findUnfetchedDomain - Looking for storage domain in all storage
	 types. In the worst case this list all VGs and all images directories
	 in all file storage domains. If some storage is not accessible, this
	 can block for 60 seconds on every inaccessible file storage domain.

	Important state changes:

	- Invalidating cache - triggers a refresh on the next access
	- Clearing all storage domains - may trigger a refresh next access
	- Adding and removing storage domains

	Here are example logs from OST run:

	$ grep storage.StorageDomainCache vdsm.log
	2019-10-25 14:20:42,080-0400 INFO  (hsm/init) [storage.StorageDomainCache] Refreshing storage domain cache (resize=True) (sdc:80)
	2019-10-25 14:20:42,700-0400 INFO  (hsm/init) [storage.StorageDomainCache] Refreshing storage domain cache: 0.62 seconds (utils:390)
	2019-10-25 14:23:26,417-0400 INFO  (hsm/init) [storage.StorageDomainCache] Refreshing storage domain cache (resize=True) (sdc:80)
	2019-10-25 14:23:27,033-0400 INFO  (hsm/init) [storage.StorageDomainCache] Refreshing storage domain cache: 0.62 seconds (utils:390)
	2019-10-25 14:24:08,706-0400 INFO  (jsonrpc/5) [storage.StorageDomainCache] Invalidating storage domain cache (sdc:74)
	2019-10-25 14:24:08,813-0400 INFO  (jsonrpc/6) [storage.StorageDomainCache] Refreshing storage domain cache (resize=True) (sdc:80)
	2019-10-25 14:24:09,360-0400 INFO  (jsonrpc/6) [storage.StorageDomainCache] Refreshing storage domain cache: 0.54 seconds (utils:390)
	2019-10-25 14:24:10,215-0400 INFO  (jsonrpc/7) [storage.StorageDomainCache] Invalidating storage domain cache (sdc:74)
	2019-10-25 14:24:10,307-0400 INFO  (jsonrpc/0) [storage.StorageDomainCache] Refreshing storage domain cache (resize=True) (sdc:80)
	2019-10-25 14:24:10,836-0400 INFO  (jsonrpc/0) [storage.StorageDomainCache] Refreshing storage domain cache: 0.53 seconds (utils:390)

	Bug-Url: https://bugzilla.redhat.com/1765684

2019-11-07  Edward Haas  <edwardh@redhat.com>

	net, tests: Remove network/__init__ content
	The network tests package __init__ module can be cleaned from all setup
	steps. All relevant tests that needed the setups are now in the
	integration sub-package with the same setups in place in the conftest
	module.

	net, tests: Remove dependency from testlib
	The only dependency left on testlib is a small threading helper.
	The helper has been copied to the only test module that uses it.

	net, tests: Convert integration tests to pytest format

	net, tests: Relocate bridge handlers to netintegtestlib

	net, tests: Relocate namespace helpers to netintegtestlib

	net, tests: Introduce iperf helper module
	Relocate integration iperf specific helpers under the integration subpackage.

	net, tests: Convert tc module to pytest format

	net, tests: Introduce netintegtestlib helper module
	Relocate integration specific helpers under the integration subpackage.

	Note: Some relevant helpers are moved in the following patches.

	net, tests: Convert unit tests to pytest format

	net, tests: Convert conf_persistence module to pytest format

2019-11-07  Vojtech Juranek  <vjuranek@redhat.com>

	storage: add missing argument in removeMapping()
	When switching to new approach how to call supervdsm api in
	devicemapper module in

	  commit afc30514c47b2be2384a4501c3175a87c0ed9aaa
	  Author: Vojtech Juranek <vjuranek@redhat.com>
	  Date:   Mon Sep 16 23:13:16 2019 +0200

	argument "deviceName" was forgotten to pass to removeMapping()
	function. Add this required argument.

	Bug-Url: https://bugzilla.redhat.com/1768735

2019-11-07  Eyal Shenitzky  <eshenitz@redhat.com>

	migration: change log.warn() to log.warning()

	storage: change all log.warn() to log.warning()

2019-11-07  Marcin Sobczyk  <msobczyk@redhat.com>

	travis: Move linters to fc30
	Py3-based linters have been moved to fc30 in jenkins. To keep both CIs
	in sync we're also switching them to fc30 on travis.

	ci: Add missing dependencies
	A travis linters run [1] has shown some missing dependencies
	in container images. This patch adds these to dockerfiles. Additionally
	one of the dependencies (python3-policycoreutils) was also absent
	in our jenkins CI requirements files, so it's also been added there.

	[1] https://travis-ci.org/oVirt/vdsm/jobs/601725955

2019-11-06  Nir Soffer  <nsoffer@redhat.com>

	imagetickets: Remove ovirt_imageio_daemon import
	We want to eliminate dependencies on ovirt-imageio packages in vdsm. The
	right way to use imageio is it's HTTP API, not by importing its private
	modules.

	Remove the uhttp import by copying the tiny UnixHTTPConnection class
	into imagetickets module.

	Replace the import check with checking the daemon socket.

2019-11-06  Milan Zamazal  <mzamazal@redhat.com>

	virt: Use wildcard to list the Python files in Makefile.am

	virt: Remove Console device
	With only XML Engine supported, we don't need that device anymore.
	All we need from it is console socket initialization and cleanup,
	which is moved to separate functions.

	virt: Support XML based memory hot plug
	Although Engine doesn't support XML based memory hot plug or hot
	unplug, we already accept `xml' argument in Vm.hotunplugMemory.  Let's
	support it the same way in Vm.hotplugMemory.

	virt: Don't report unhandled devices
	We don't track most kinds of devices as device instances anymore, so
	this debug message is no longer useful.

2019-11-06  Yuval Turgeman  <yturgema@redhat.com>

	network: disable clevis dracut module
	Enabling clevis on an ovirt host requires special handling as the clevis
	dracut module enables default (dhcp) networking in the ramdisk
	regardless of the network configuration on the running system.

	Bug-Url: https://bugzilla.redhat.com/1756944
	Bug-Url: https://bugzilla.redhat.com/1760262

2019-11-05  Edward Haas  <edwardh@redhat.com>

	net, tests: Convert ifcfg config writer module to pytest format

	net, tests: Remove uneeded mocks from ifcfg_config_writer

	net, tests: Replace MonkeyPatch with standard mock

	net, tests: Convert ipwrapper module to pytest format

	net, tests: Replace all SkipTest with pytest.skip

	net, tests: Extract SkipTest from firewall module
	Do not skip from within the firewall helper module.
	Leave it up to the higher stack levels to take the decision.

	As the tests are no longer running nose, convert the skip to a pytest
	skip.

	net, tests: Use network.compat.mock instead of testlib.mock

	net, tests: Rename internal test exception
	The network unit tests are using a custom exception named
	TestException, which causes warnings when running pytest.

	In order to avoid the warnings, the exception is renamed such that it
	will not start with 'Test'.

	net, tests: Move tc tests from nose to pytest
	The test module got split into two test modules, one for unit tests
	and another for integration tests.

	With this move, all network tests are moved to pytest.
	The makefiles are updated to avoid running nose network tests.

	net, tests: Move sriov tests from nose to pytest

	net, tests: Move netupgrade tests from nose to pytest
	The test module got split into two test modules, one for unit tests
	and another for integration tests.

	net, tests: Move netswitch tests from nose to pytest

	net, tests: Move netlink tests from nose to pytest

	net, tests: Move netinfo tests from nose to pytest
	The test module got split into two test modules, one for unit tests
	and another for integration tests.

	net, tests: Load bonding kernel module if missing

2019-11-04  Edward Haas  <edwardh@redhat.com>

	net, tests: Move models tests from nose to pytest

	net, tests: Move lldpad tests from nose to pytest
	The test module got split into two test modules, one for unit tests
	and another for integration tests.

	net, tests: Move link setup tests from nose to pytest

	net, tests: Move ipwrapper tests from nose to pytest
	The test module got split into two test modules, one for unit tests
	and another for integration tests.

	net, tests: Move ip_validator tests from nose to pytest

	net, tests: Move ip_dhclient tests from nose to pytest

	net, tests: Move ip_address tests from nose to pytest
	The test module got split into two test modules, one for unit tests
	and another for integration tests.

	net, tests: Move ifacquire tests from nose to pytest

	net, tests: Move driverloader tests from nose to pytest

	net, tests: Move dpdk tests from nose to pytest

	net, tests: Move dns tests from nose to pytest
	The test module got split into two test modules, one for unit tests
	and another for integration tests.

	net, tests: Move connectivity tests from nose to pytest

	net, tests: Move config network tests from nose to pytest

	net, tests: Move ifcfg tests from nose to pytest
	The test module got splitted into two test modules:
	- ifcfg_config_writer (integration)
	- ifcfg_acquire (unit)

2019-11-03  Edward Haas  <edwardh@redhat.com>

	net, tests: Move conf persistence tests from nose to pytest

	net, tests: Move canonicalize tests from nose to pytest

2019-11-01  Vojtech Juranek  <vjuranek@redhat.com>

	vdsmd: change KillMode back to control-group
	We've changed KillMode to mixed in

	commit a0b9f2c657f2fd5b32ee613ef28811c3e36f6365
	Author: Yeela Kaplan <ykaplan@redhat.com>
	Date:   Tue Aug 18 18:04:03 2015 +0300

	    service: change vdsm KillMode to mixed

	This doesn't seems to be correct, as child processes are killed
	suddenly with SIGKILL, giving them no chance to terminate properly.

	Switch back to default control-group, when all processes obtain SIGTERM
	first and are killed by SIGKILL after timeout if they haven't terminated
	yet.

2019-11-01  Nir Soffer  <nsoffer@redhat.com>

	vdsmd.service: Ensure that child processes are killed
	Vdsm depends on systemd to terminate all child processes when vdsmd
	service is stopped. In EL6 time, we deopended on cpopen to set a death
	signal for storage child processes, ensuring that commands writing to
	storage are killed when vdsm is stopped, even if vdsm is killed.

	While moving to EL7 and systemd and porting to python 3, we removed
	death signal support, and started to depend on systemd to kill child
	processes.

	commit b9fd772eafcc7b60f3f0e4965183a9da6ef05c97
	Author: Yaniv Bronhaim <ybronhei@redhat.com>
	Date:   Sat Jan 23 19:04:22 2016 +0200

	    Remove deathSignal usages in sync execCmd calls

	Depending on systemd to kill child processes[1] is more correct than
	using death signal, since death signal affects only the direct child
	processes created by vdsm. If a child process creates its own child
	processes, they are not using death signal, and would not be killed when
	vdsm is terminated.

	Turns out that systemd does not terminate child processes if
	ExecStopPost is set[2]. This can leave commands such as qemu-img writing to
	to logical volume after vdsm was stopped. Engine will detect that the
	command has failed and will delete the logical volume. A new logical
	volume may share the same storage as the old logical volume, causing the
	left-over qemu-img to write to another volume, possibly corrupting user
	data.

	I think the most robust way to fix this issue would be to avoid the
	dependency on systemd and use volume leases held by the actual program
	accessing storage (e.g. qemu-img), and the SPM commands deleting
	volumes. This requires major changes in storage code, that can be
	planned for future version. For now, we need a minimal change that can
	be ported safely to 4.3 and work with any supported engine version.

	Fix this issue by removing the ExecStopPost from vdsmd service file.
	With this change, systemd terminates child processes as expected.

	ExecStopPost was used only for calling after_vdsm_stop hook. Replace it
	by calling the hook from vdsm main thread, when starting shutdown after
	receiving a termination signal. The downside of this change is that
	after_vdsm_stop will not be called if vdsm is killed.

	Systemd behavior was tested only on Fedora 29, with the reproducer
	scripts (see bug 1761260). This change is not tested yet.

	[1] http://man7.org/linux/man-pages/man5/systemd.kill.5.html
	[2] https://bugzilla.redhat.com/show_bug.cgi?id=1761260#c0

	Bug-Url: https://bugzilla.redhat.com/1759388

2019-10-31  Marcin Sobczyk  <msobczyk@redhat.com>

	ci: Run tests on el8
	This patch enables py3 tests on el8.

	We don't have native-el8 CI slaves yet, so we can't use:

	 host-distro: same

	option anymore. Dropping it would leave us in a state, where the CI
	sub-stages could be run on el7 with mocked el8. This causes storage
	tests to fail, since they require a more recent kernel version to run
	successfully. Thus, we switched to:

	 host-distro: newer

	The effect is using native-fc30 slaves with el8 mocked. With this option
	the fc30 sub-stage is also run on native-fc30 slaves.

	ci: Add missing 'python3-ioprocess' package
	'python3-ioprocess' package is needed on el8 to run storage tests. This
	patch adds the 'ovirt-master-snapshot' repository which contains this
	package.

2019-10-30  Dominik Holler  <dholler@redhat.com>

	network: Avoid dracut overwriting ifcfg files
	If dracut detects that networking might be required to boot,
	dracut will overwrite ifcfg files. This might lead to
	unexpected network configuration after boot.
	To prevent this situation, vdsm instruct dracut to avoid
	touching the ifcfg files.

	This becomes relevant now, because
	https://gerrit.ovirt.org/#/c/103009/ introduced a dependency
	to clevis-dracut, which triggers because of bug 1762028
	the activation of network and overwriting ifcfg files on boot.

	Bug-Url: https://bugzilla.redhat.com/1760262

2019-10-30  Nir Soffer  <nsoffer@redhat.com>

	threadPool: Fix bug in running task count
	If running a task raised, we did not call _task_finished(), so the
	logged running task count is never decreased, and all future logs are
	off by one (for every task failure).

	threadPool: Remove taskCallback
	We never used the option to run a callback with the task return value in
	the last 10 years, remove the callback.

	With this change coverage for this module is 97%.

	threadPool: Make updating running tasks private
	Replace setRunningTasks(True) and setRunningTasks(False) with
	_task_started() and _task_finished() private methods. It it more clear
	and less likely to be used outside of this module.

	threadPool: Remove the option to wait for tasks
	This is useful for generic thread pool, when you want to wait until all
	tasks are processed, and then stop all threads. But this is storage
	thread pool and we don't have such use case.

	With this change coverage for this module is 92%.

	threadPool: Remove online resize feature
	We did not use this feature in the last 10 years, and we don't plan to
	use it.

	Since we don't need to support online resizing, inline the code to start
	the workers threads in __init__(), and replace the call to resize the
	thread count to 0 with single del statement.

	Finally since we start the thread only in __init__, the _count instance
	variable is not needed now.

	threadPool: Remove unused and unneeded code
	Remove methods that we never used and we don't plan to use. With this
	change, coverage for this module is 91%.

	- Getting number of running tasks
	- Getting current thread count
	- Stopping a worker thread when all queued tasks are processed
	- Reset thread pool so it can be restarted after it was joined
	- Unused _taskThread instance variable

	threadPool: Remove example usage
	The new tests serve also as example usage.

	The example usage code also breaks coverage report since is not run
	during the tests. With this change, coverage for this module is 87%.

	tests: Add basic thread pool tests
	Storage thread pool is over complicated, inefficient, hard to test, and
	provide unwanted features. Before rewriting it, add test covering the
	basic behaviour.

2019-10-29  Amit Bawer  <abawer@redhat.com>

	storage: Add missing message to VolumeGeneralException
	This class serves only as a base class to other exceptions, but we
	add the missing message anyway to keep things in order.

	storage: Fix TaskAborted exception
	1. Remove str() method.
	2. Format value member with passed parameters to be returned
	   by the base class str() method along with the message.

	Also fix expected message for task status in Task tests.

	storage: Fix ResourceException class
	1. Add missing class message.
	2. Remove str() method.
	3. Set the value as a formatted string which will be returned with the
	   message by the base class str() method.

	gluster: Use base class msg property for accessing GlusterException.message
	This is ment to keep pylint py3k happy without the bogus complaint:

	************* Module vdsm.gluster.exception
	W: 51,12: Exception.message removed in Python 3 (exception-message-attribute)

	A long term resolution would be refactoring all 'message' exception members
	into 'msg', and getting rid of VdsmException.msg property,
	but this would involve every exception inheriting from VdsmException class.

	py3, storage: log StorageException as string
	Replace explicit print of 'message' and 'value' members
	by facilitating the str() method from its base class.

	storage, tests: Add test_info to exception tests
	Provide a sanity test going over all storage exceptions
	inherited from VdsmException.

	The test instantiates the exception objects and verifies they
	can be stringified.

2019-10-29  Edward Haas  <edwardh@redhat.com>

	tox: Statically specify black version
	Black formatter updates may empose new rules which require changes in
	the project code.

	In order to control this, the black version is fixed.
	From time to time the version needs to be updated and with it adjusting
	the code.

2019-10-27  Nir Soffer  <nsoffer@redhat.com>

	lvm: Avoid pointless retries when extending lv
	In every OST run we have 80 extend LV calls, and 20 of them log this
	warning:

	2019-07-12 15:02:41,748-0400 WARN  (tasks/8) [storage.LVM] Command with
	specific filter failed, retrying with a wider filter, cmd=['/sbin/lvm',
	'lvextend', ...] rc=5 err=['  New size given (7 extents) not larger than
	existing size (8 extents)'] (lvm:345)

	This is the flow:

	1. run lvextend to extend the lv with specifc lvm filter
	2. lvextend fail because the lv is already large enough
	3. retry mechanism assumes that the filter is stale, and invalidated the
	   filter
	4. retry mechanism run lvextend again with a wider filter
	5. lvextend fail again because the lv is already large enough
	6. check lv size and log a debug log that the lv is already large enough

	This flow is not new, but the pointless retries were hidden in the past
	since the retry mechanism was silent. Since 4.3 retries after failures
	log warnings.

	Most of this flow is pointless. Reduce the flow to:

	1. check lv size before extending and log debug log if extend is not
	   needed
	2. run lvextend to extend the lv with specific lvm filter

	Now the command is expected to fail only if the vg does not have enough
	free extents. In this case invalidating the filter and retrying makes
	sense.

	storagetestlib: Add Callable result argument
	Add a result argument to Callable.__init__() so we can test the returned
	result. If the result is an exception, Callable will raise the result
	instead of returning it. This allows testing how storage thread pool
	handle failure tasks, and probably needed to complete tasks tests.

	storagetestlib: Allow testing called argument
	Add args to __call__() and store the arguments so tests can verify how a
	callable was called. This is needed for testing storage thread pool.

	To allow testing queued tasks, add was_called() method.

	storagetestlib: Move Callable timeout to constructor
	Callable.__call__() accepts a timeout argument, which can be used to
	limit the time a callable will hang during the tests. This is a good
	idea (although unused yet), but we do not control __call__() arguments,
	so better move it to __init__().

	Replace "hang" argument with "hang_timeout". The default value is 0
	keeping the previous behaviour. To create a hanging callback create it
	with the maximum timeout instead of True.

2019-10-25  Amit Bawer  <abawer@redhat.com>

	readme: Add CI sections and travis guidelines for storage

2019-10-25  Nir Soffer  <nsoffer@redhat.com>

	sd: Add getVolumeSize()
	getVolumeSize() replaces getVSize() and getVAllocSize() which are
	deprecated and should not be used in new code.

	On block storage, getVSize and getVAllocSize are the same method, so
	when invoking HSM.getVolumeSize() we call getVSize() twice.

	On file storage both getVSize and getVAllocSize are implemented using
	oop.os.stat(). When invoking HSM.getVolumeSize() we call oop.os.stat()
	twice. Every oop call may timeout after 60 seconds, so in the worst case
	this can cause a delay of 120 seconds.

	In both storage types if accessing storage is very slow, the caller must
	pay twice for the slow access for no reason. With new getVolumeSize() we
	access storage only once for getting both apparent and true size.

	This change cannot fix the root cause for slow getVolumeSize() calls,
	but in some cases it may decrease the delay by half.

	Related-to: https://bugzilla.redhat.com/1598266

2019-10-25  Vojtech Juranek  <vjuranek@redhat.com>

	storage: replace LVM dict with fresh values
	When loading all LVs, create new dict from the obtained values and
	replace LVMCache._lvs dict with this one. This ensures we have fresh
	values in the cache after calling LVMCache._loadAllLvs() and using this
	approach we don't have to deal with removing stale LVs from the dict so
	the code for removing stale LVs can be deleted now.

	Also rename LVMCahce._reloadAllLvs() to _loadAllLvs() to
	better reflect the purpose of this method in its name.

2019-10-25  Nir Soffer  <nsoffer@redhat.com>

	tests: Speed up task manager tests
	Created the task manger with waitTimeout=0.1, speeding up
	prepareForShutdown(wait=True) from 3 seconds to 0.05 seconds.

	Here are slowest task manager tests before this change:

	6.36s call     tests/storage/taskmanager_test.py::test_persistent_job
	3.01s call     tests/storage/taskmanager_test.py::test_stop_clear_task
	3.01s call     tests/storage/taskmanager_test.py::test_revert_task

	And after this change:

	0.33s call     tests/storage/taskmanager_test.py::test_persistent_job
	0.06s call     tests/storage/taskmanager_test.py::test_stop_clear_task
	0.06s call     tests/storage/taskmanager_test.py::test_revert_task

2019-10-25  Amit Bawer  <abawer@redhat.com>

	storage, exception: Remove dictionary copy from ImageDaemonError
	Use less complex code for formatting fields into the exception value.

	storage, tests: Add generator for traversing module exceptions
	Optionally allow specifying base type to filter out exception by base class.

	This will serve following sanity tests on exceptions searched by module and
	possibly base types.

	common, compat: Add wrapper for inspect get arg spec API
	Since inspect's getargspec(...) of python 2 was modified into
	getfullargspec(...) in python 3, we provide a compatibility
	wrapper named get_args_spec(...) hiding the differences.

2019-10-25  Vojtech Juranek  <vjuranek@redhat.com>

	storage: register xtnd mailbox command as bytes
	Mailbox commands has to be registered as bytes, otherwise once we search
	for a message type, we would fail to find it:

	    [storage.MailBox.SpmMailMonitor] SPM_MailMonitor: unknown message type encountered: b'xtnd'

	Use mailbox constant for xtnd command when starting SPM. Use this
	constant also in mailbox test.

2019-10-25  Marcin Sobczyk  <msobczyk@redhat.com>

	vdsm-tool: Drop usage of 'imp' module
	'imp' module is deprecated - this patch replaces its usage with the
	newer 'importlib' module.

	Bug-Url: https://bugzilla.redhat.com/1663661

	vdsm-tool: Drop support for 4.16 upgrade fix
	This patch drops a fix that was about importing 'vdsm.tool' from wrong
	location when upgrading from 4.16 version. This is not relevant anymore.

	After converting imp-based import of 'vdsm.tool' to a regular one,
	pylint found an issue with an exception handling code:

	 E:215,11: Bad except clauses order (UsageError is an ancestor class of ExtraArgsError) (bad-except-order)

	Thus, the order of exception catching has also been changed.

	Bug-Url: https://bugzilla.redhat.com/1663661

	linters: Lint 'vdsm-tool'
	This patch adds 'vdsm-tool' to be covered by flake8 and pylint.

	stdci: Build packages on ppc64le-el8

2019-10-25  Nir Soffer  <nsoffer@redhat.com>

	automation: Remove global archs/distributions
	The global distro/arch definition seems to be redundant, because we
	override it almost everywhere. This make it harder to understand, and
	cause the CI to try to run default stages when sub-stage is not defined.

	Specify explicit arches/distributions for all stages/sub-stages.

2019-10-24  Pavel Bar  <pbar@redhat.com>

	py3, tests: decorator modification for skipping tests in "backends_test.py"
	Tests should be skipped if sanlock is not available.
	Currently: Python 3 on Fedora 29 or Python 2 on RHEL 8.
	Updating the decorator name and message to be more generic.

2019-10-24  Nir Soffer  <nsoffer@redhat.com>

	fc-scan: Port to python 3
	Looks like the only issues were writing strings instead of bytes and
	missing absolute import. Add also missing division import for consistent
	behaviour on python 2 and 3. Tested with python 3.7 on Fedora.

	fc-scan: Speed up device scanning
	Rewrite fc-scan using concurrent.tmap(), rescanning devices with 64
	worker threads instead of hosts threads. This gives 19x speedup on
	system with 1311 devices and high I/O load, and 10x speedup when system
	is idle.

	There is a mutex on each SCSI Host object taken around the scan
	operation when rescanning a host, or a target. So for host scanning, we
	should use thread per host to get maximum parallelism. Rescanning an
	already-discovered device, though, is different and does not take the
	mutex. So there is some potential for parallelism there.

	Previously we used the hosts threads to rescan devices. On a setup using
	2 fc_hosts we had only 2 threads rescanning devices.  Now we wait until
	host scanning is finished, and submit the devices to a new larger thread
	pool for device scanning. This maximizes parallelism during device
	rescan which is the slow part.

	Bug-Url: https://bugzilla.redhat.com/1598266

2019-10-23  Pavel Bar  <pbar@redhat.com>

	py3, tests: enable tests skipped for Python3 in "misc_test.py"
	- Fix tests to work for Python3.
	- Remove "skipif(six.PY3)".

	py3, tests: cosmetic improvements in "misc_test.py"
	Removing the appearances of unrecommended backslash
	at the end of line.
	This is the preparation phase to "skipif(six.PY3)" removal.

2019-10-23  Marcin Sobczyk  <msobczyk@redhat.com>

	tests: Remove black magic from makefile
	There are no more py3-blacklisted test modules. We've also switched from
	usage of py27/py36/py37 suffixes to py2/py3 ones in the makefiles.
	There's no reason to keep the black magic that was needed before. Let's
	simplify things to make it more easy to switch to py3 completely in the
	future.

	py3: tests: Skip 'unicode_test' tests on py3
	Tests inside 'unicode_test' module are used to check the procedures that
	make utf-8 the default encoding for py2 in
	'static/usr/share/vdsm/sitecustomize.py'. They are not relevant and
	can't really work on py3 though due to string vs byte literals issues.

	We can't move this test module to tox since this would skip
	the aforementioned procedures. The only nice way to handle this test
	module is to simply skip the tests inside for py3.

	py3: tests: Move 'stomp_test' to tox
	In an effort to clean up 'tests' directory, we're moving
	'stomp_test' module to 'tests/lib/yajsonrpc' dir and run it with tox.

2019-10-23  Nir Soffer  <nsoffer@redhat.com>

	curl-img-wrap: Add missing __future__ imports
	All storage modules must import absolute imports and division to have
	consistent behaviour on python 2 and python 3.

2019-10-23  Dan Kenigsberg  <danken@redhat.com>

	udevadm: replace obsolete execCmd with run()

	taskset: replace obsolete execCmd with run()

	tests: modprobe: replace deprecated execCmd with run

2019-10-22  Dan Kenigsberg  <danken@redhat.com>

	mkimage_test: replace deprecated execCmd with run()

	janitorial: drop unused files
	If7e157448a stopped using build-aux/pylint-py3k-whitelist, py3k-pylintrc
	and py3-blacklist.txt, but kept them in the code base.

2019-10-22  Eyal Shenitzky  <eshenitz@redhat.com>

	tests: use remove class TestCheckResult
	Third step to convert check_test.py to use pytest.
	 - remove class TestCheckResult

	tests: use pytest monkeypatch in check_test.py
	Second step to convert check_test.py to use pytest.
	 - use pytest monkeytest
	 - use pytest parametrize
	 - remove inheritence from VdsmTestCase

	tests: use pytest assert in check_test.py
	First step to convert check_test.py to use pytest.
	 - remove all self.assertXXX to use assert
	 - rename all setUp/tearDown methods

2019-10-22  Vojtech Juranek  <vjuranek@redhat.com>

	storage: always use lock when modifying lvm dict
	When reloading all LVs, we modify internal LV dictionary, but don't
	use lock when doing it, so it can be modified while another thread
	is iterating during the LVs dict.

	Acquire the lock before we do any changes in the LVs dict. We should
	acquire before we check for stale LVs to be sure that the data we will
	use later on for modifying the dict is up to date.

2019-10-21  Dan Kenigsberg  <danken@redhat.com>

	drop vdsm_hook/promisc
	The `promisc` hook was created in order to allow Engine administrators
	configure a VM that capture all traffic of other VMs. This functionality
	is available by the Port Mirroring option of a vNIC profile.

	The hook has not been used or tested for many years. It is time to drop
	its code.

2019-10-21  Marcin Sobczyk  <msobczyk@redhat.com>

	stomp: Fix improper frame terminator search
	A bug has been introduced in [1] - since indexing bytes yields slices
	in py2, and ints in py3, we've forced a slicing operation on the buffer
	when searching for frame terminator to have a consistent result accross
	both interpreter versions. Unfortunately in case where the buffer holds
	more data after the null byte, the comparison fails causing the parser
	to raise an exception. This patch fixes this issue by replacing slicing
	with indexing and using different objects for comparison on py2 and py3.

	Additionally, a simple test for parsing multiple frames that covers
	the described case has been added for the 'stomp.Parser' class.

	[1] https://gerrit.ovirt.org/#/c/101114/

2019-10-18  Miguel Duarte Barroso  <mdbarroso@redhat.com>

	tests, net, nmstate: initial rollback functional tests passing

	tests, net: Fix improper rollback func test asserts
	Due to wrong indentation, several asserts have not been executed
	at all.

	By fixing the asserts on the rollback tests, a bug is exposed when
	the OVS switch type is used, since, as can be seen in [0], the
	'rollback' option used should be '_inRollback'. The fix to it is
	also contained in this patch.

	[0] - https://github.com/oVirt/vdsm/blob/\
	99cef25660a78390d0b53077c92aab1fb8eac08f/\
	lib/vdsm/network/api.py#L204

	tests, net, nmstate: let NM manage dummies on nmstate backend
	Up to now, we've relied on the network manager configuration to
	indicate that the dummy interfaces created in vdsm's network
	functional tests are to be managed by NetworkManager, as explained
	in [0].

	Despite that, the func test framework cleanup is done by the test
	framework, in [1]. This patch thus centralizes the configuration of
	the dummy interfaces by NetworkManager in the framework, as is
	currently done for veth interfaces.

	This homogenizes the framework, thus leaving all configuration to be
	done the same way, independently of the interface type, via the test
	framework.

	The network functional tests README file is updated accordingly to
	reflect this change.

	[0] - https://github.com/oVirt/vdsm/tree/master/tests/network/\
	functional#configuring--installing-the-environment
	[1] - https://github.com/oVirt/vdsm/blob/master/tests/network/\
	nettestlib.py#L263

	tests, net: move helper func at the tail of the rollback tests

2019-10-17  Miguel Duarte Barroso  <mdbarroso@redhat.com>

	net, nmstate: compress ipv6 static addresses
	When creating the NetworkConfig object, store all ipv6 static
	addresses in their compressed notation, thus working around bug [0]
	in nmstate.

	[0] - https://bugzilla.redhat.com/1760800

	integ tests, net: test exploded notation ipv6 addresses
	When used against the nmstate backend, it is not possible
	to correctly configure logical networks using static ipv6
	addresses in the exploded notation format, since nmstate
	interprets the same IP address in exploded and compressed
	notations to be a different IP address. This leads to failure
	when verifying if the desired state was applied.

	The supplied test also fails against the init script backend,
	but for a different reason: the kernel configuration and the stored
	configurations are different, since their IPv6 address formats
	differ.

	This patch adds an xfailed test that checks this behavior.

	move helper func at the tail of the TestNetworkStaticIpBasic tests

2019-10-17  Kaustav Majumder  <kmajumde@redhat.com>

	gluster: Fix errors related to patch https://gerrit.ovirt.org/#/c/103508/
	The above patch throws an exception
	error: can only concatenate list (not "unicode") to list'
	This patch fixes the above

	Bug-Url: https://bugzilla.redhat.com/1673277

2019-10-17  Marcin Sobczyk  <msobczyk@redhat.com>

	py3: tests: Port 'bridge_test' to py3
	"imp" module is deprecated and it's usage should be removed.

	This patch redefines the creation of dynamic "vdsm.API" module in terms
	of newer "importlib" module. Unfortunately this module is very limited
	on py2, so we're dropping py2-compatibility for these tests.

	Python 3 also doesn't support negative values for the "level" argument
	passed to "__import__" [1] (which means an attempt of both absolute
	and relative import on py2), so we're switching to the default value
	of 0 (which means absolute imports only) since it works for us.

	[1] https://docs.python.org/3/library/functions.html?highlight=__import__#__import__

	Bug-Url: https://bugzilla.redhat.com/1663661

	py3: tests: Move 'bridge_test' to tox
	In an effort to clean up 'tests' directory, we're moving
	'bridge_test' module to 'tests/lib/' dir and run it with tox.

	Since the tests don't work on py3 yet, we're marking them with xfail.

	py3: tests: Move 'protocoldetector_test' to tox
	In an effort to clean up 'tests' directory, we're moving
	'protocoldetector_test' module to 'tests/lib' dir and run the module
	with tox.

	To make the module work with py3 a couple of string x bytes issues
	needed to be fixed in the test module itself.

	py3: utils: Fix '_parseCmdLine' and 'utils_test'
	'utils._parseCmdLine' function is used to read out processes' arguments
	from their '/proc/${pid}/cmdline' files. The issue is it tries to split
	the read file with a string pattern which doesn't work in py3.

	The test that uses the function was also broken on py3 - it compared
	the read data (bytes) with string literals (unicode) passed
	to 'commands.start'.

	Additionally a call to deprecated 'execCmd' function has been replaced
	with 'commands.run'.

	py3: tests: Move 'utils_test' to tox
	In an effort to clean up 'tests' directory, we're moving
	'utils_test' module to 'tests/lib/' dir and run it with tox.

	One of the tests is failing when running with py3 so it's been
	marked with an xfail.

	py3: tests: Remove usage of 'TestCaseBase' in 'stompasyncdispatcher_test'
	In an effort to move away from homegrown 'testlib' and deprecated
	'nose' testing framework, we're rewriting our tests to 'pytest'
	and running them with 'tox'.

	This patch removes the usage of 'testlib.TestCaseBase' class
	in 'stompasyncdispatcher_test' module.

	py3: tests: Move 'stompasyncdispatcher_test' to tox
	In an effort to clean up 'tests' directory, we're moving
	'stompasyncdispatcher_test' module to 'tests/lib/yajsonrpc' dir and run
	the module with tox.

	One of the tests was failing when running with py3 due to the 'body'
	not being encoded to bytes so it has been fixed.

2019-10-14  Marcin Sobczyk  <msobczyk@redhat.com>

	automation: Add el8 repos and packages for check-patch
	This patch adds the dependency list and the repo file needed to run
	'check-patch' sub-stages with el8. Currently some of the packages needed
	to run the tests are missing (namely: mom, python3-ioprocess) and some
	need to be pulled from Sandro's internal repo . When this is fixed,
	we should drop the internal repo and add back the missing packages.

	automation: Fix python coverage usage
	Fedora Rawhide [1] and CentOS 8 'python3-coverage' packages don't
	provide 'python3-coverage' file we've been using so far. This patch
	switches to 'python3 -m coverage' form of using it which is compatible
	with both older package versions and the new one.

	[1] https://rpmfind.net/linux/RPM/fedora/devel/rawhide/x86_64/p/python3-coverage-4.5.4-3.fc32.x86_64.html

	stdci: Rename tests-py{27,37} sub-stages to tests-py{2,3}
	When writing patches around the build system that allowed running
	tests/building with multiple interpreter versions, a scenario
	where one wants to run multiple minor versions of the same major
	interpreter version (i.e. both py36 and py37) was considered.
	Although it's theoretically possible it causes many hurdles - we rely
	on python libraries that come from RPMs and these are built against
	single minor version on each OS. We therefore want to switch from usage
	of 'py{27,36,37}' suffixes 'to py{2,3}' ones.

	This patch renames the 'tests-py{27,37}' sub-stages of 'check-patch'
	stage in stdci.yaml file to 'tests-py{2,3}'.

	tests: Rename 'NETWORK_PY*_COVERAGE' variables
	When writing patches around the build system that allowed running
	tests/building with multiple interpreter versions, a scenario
	where one wants to run multiple minor versions of the same major
	interpreter version (i.e. both py36 and py37) was considered.
	Although it's theoretically possible it causes many hurdles - we rely
	on python libraries that come from RPMs and these are built against
	single minor version on each OS. We therefore want to switch from usage
	of 'py{27,36,37}' suffixes 'to py{2,3}' ones.

	This patch changes the names of the variables that define the minimal
	allowed code coverage for network tests from
	'NETWORK_PY{27,36,37}_COVERAGE' to 'NETWORK_COVERAGE'.

	makefile: Rename check-py{27,37} targets to check-py{2,3}
	When writing patches around the build system that allowed running
	tests/building with multiple interpreter versions, a scenario
	where one wants to run multiple minor versions of the same major
	interpreter version (i.e. both py36 and py37) was considered.
	Although it's theoretically possible it causes many hurdles - we rely
	on python libraries that come from RPMs and these are built against
	single minor version on each OS. We therefore want to switch from usage
	of 'py{27,36,37}' suffixes 'to py{2,3}' ones.

	This patch renames the 'check-py{27,36,37}' targets in 'tests/Makefile.am'
	to 'check-py{2,3}'.

	tox: Add 'gluster-py37' environment
	This patch adds the missing 'gluster-py37' tox environment that runs
	gluster tests on py37.

	hooks: Improve hooks' environment preparation
	Python 3 handles encoding env variables by itself, so no manual encoding
	is not needed. For Python 2 we're switching to usage of default system
	encoding instead of forcing utf-8.

	In our 'tox.ini' file we switch to 'C' locale [1], which happens
	to be 'ascii' or 'ANSI_X3.4-1968' (one of ascii's revisions [2])
	on some of our slaves. There's no chance that localized characters will
	work at all with an ascii locale - the test case is expected to fail in
	this situation, so it's been marked with xfail.

	[1] https://github.com/oVirt/vdsm/blob/402bde5a94cfc0f253d966e54909a5c225694bc5/tox.ini#L15
	[2] https://stackoverflow.com/questions/48743106/whats-ansi-x3-4-1968-encoding

	tests: ssl: Mark tls1 and tls1.1 tests with skipif
	CentOS's 8 default crypto policy doesn't allow usage of tls v1 and
	tls v1.1. SSL tests that try to use 'openssl' with aforementioned
	versions won't work.

	py3: Replace pylint-py3k with py3-based pylint
	'pylint-py3k' was here to help us migrate our codebase to py3. Since
	we're mostly done with the effort, we're replacing it with proper
	py3-based pylint run.

	The original, py2-based 'pylint' tox environment and makefile target
	is being renamed to 'pylint-py2', while 'pylint' equivalents mean
	a py3-based run now.

	Pylint==1.9.3 doesn't work with py37 and spits out tons of unrelated
	errors like these:

	 F:  1, 0: <class 'RuntimeError'>: generator raised StopIteration (astroid-error)

	so for py3 run we're bumping to the latest 2.4.0 version.

	ci: Add py3-based flake8 run to linters
	Until now we were running flake8 with py2 only. This patch changes the
	'flake8' makefile target and the 'flake8' tox environment to run flake8
	with py3. The py2-based flake8 run is still available with 'flake8-py2'
	tox env or makefile target. The new py3-based flake8 run is also being
	added to 'lint-py3' target, and hence, to the py3-based linters sub-stage
	we run on fc30.

	ci: Split linters into py2- and py3-based ones
	In an effort to use py3-based linting, we're introducing a fc30-based
	'linters' sub-stage along with 'lint-py3' makefile target. Currently
	the sub-stage runs 'gitignore', 'execcmd' and 'black' linters.
	Subsequent patches will add a py3-based 'flake8' run and replace
	'pylint-py3k' run with a py3-based 'pylint' run.

	Makefile targets are implemented in the same manner as 'tests' target -
	by also introducing intermediary 'lint-target' and 'lint-all' targets.

	- 'lint' and 'lint-all' run linters for all supported Python versions
	- 'lint-target' runs linters only for the target Python version
	- 'lint-py2' and 'lint-py3' are the actual target implementations
	  for each Python version

2019-10-12  Nir Soffer  <nsoffer@redhat.com>

	helpers: Fix storage helpers on CentOS 8 build
	In CentOS 8, "fallocate" and "managedvolume-helper" fail to run with:

	    vdsm.common.cmdutils.Error: Command ['../helpers/fallocate', '4096',
	    '/var/tmp/vdsm/test_allocate0/image'] failed with rc=127 out=b''
	    err=b'ionice: failed to execute ../helpers/fallocate: No such file
	    or directory\n'

	This happens because the script using:

	    #!/usr/bin/python2

	but python2 is not installed on CentOS 8.

	Fix by running the scripts using sys.executable:

	    sys.exectuable /path/to/script

	This is ugly, assuming that the scripts are using python, but it is the
	simplest way to have them work on both python 2 and 3, and enable the
	CentOS 8 build.

	tests: Fix gluster tests helpers on CentOS 8
	The helper fails on CentOS 8 trying to run /usr/bin/python2. Remove the
	execute bit and run using sys.executable.

	sdc: Remove unused storage_repo
	The value is always sc.REPO_DATA_CENTER and it is unused.

2019-10-11  Nir Soffer  <nsoffer@redhat.com>

	tests: Fix sigutils_test.py on CentOS 8 build
	The test run a child process using python script with #!/usr/bin/python2
	which cannot work on CentOS 8. Fix by running it with sys.executable.

	Since the script does not need to be executable now, remove the execute
	bit and the #! line.

2019-10-11  Marcin Sobczyk  <msobczyk@redhat.com>

	py3: storagedev: Fix usage of LVMThinLogicalVolumeDevice
	With [1] blivet moved from multiple-classes to single-class-with-mixins
	implementation for LVM devices. The 'LVMThinLogicalVolumeDevice' class
	doesn't exist in newer library versions. The proper way to create
	an LVMThinLogicalVolumeDevice seem to be using 'LVMLogicalVolumeDevice'
	class with 'seg_type' set to 'thin' [2].

	[1] https://github.com/storaged-project/blivet/commit/9b4f06096d6f723bdcdd9d72509c2278db03632e
	[2] https://github.com/storaged-project/blivet/blob/86bdc2a2114fdf465bcfa4aa0cbfe3af547ebda3/tests/devices_test/lvm_test.py#L62
